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Preface 


DECISION THEORY APPLIES TO STATISTICAL PROBLEMS THE PRIN- 
ciple that a statistical procedure should be evaluated by its consequences 
in various circumstances, a principle first clearly emphasized by Neyman 
and Pearson in their theory of testing hypotheses. The extension of this 
principle to all statistical problems was proposed by A. Wald in 1939, 
and developed by him in a series of papers culminating in his book, 
Statistical Decision Functions, Wiley, 1950. The importance of Wald’s 
approach has been widely recognized, and the decision-theory viewpoint 
has been adopted in much recent research in statistics. This book is 
intended primarily as a textbook in decision theory for first-year grad- 
uate students in statistics. . 

Wald’s mathematical model for decision theory is a special case of 
that for game theory, as introduced by E. Borel in 1921 and more gener- 
ally by J. von Neumann in 1928, and expanded in the definitive book 
by von Neumann and O. Morgenstern, Theory of Games and Economic 
Behavior, Princeton, 1944. Many of the results in von Neumann and 
Morgenstern’s book, e.g., the reduction of games to normal form, the 
minimax theorem, and the utility theorem, and much of the research 
stimulated by that book are of basic importance for decision theory, so 
that our book begins with a treatment of the relevant parts of game 
theory. Occasionally we have permitted ourselves the luxury of discuss- 
ing topics in game theory not strictly relevant to statistics: perfect- 
information games, for example. For an excellent treatment of numer- 
ous aspects of game theory not treated here, the reader is referred to 
J. ©. C. MeKinsey’s Introduction to the Theory of Games, McGraw-Hill, 
1952. 

Many have contributed to the development of game theory and deci- 
sion theory, and in most cases the reader will find in the references indi- 
cated at the end of the book a more complete treatment of the topics 
presented here. 

The mathematical prerequisite for reading this book is a course in 
elementary analysis, covering such topics as limits of sequences, uniform 
convergence, the Riemann integral, and the Heine-Borel theorem. For 
a few sections, some knowledge of matrices, determinants, and linear 
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dependence is required. The statistical concepts used are defined in the 
book, but the reader who has no previous knowledge of statistics may 
find some parts difficult. Only discrete distributions appear in the for- 
mal development, but continuous distributions, e.g., the normal, are 
used for illustrative purposes. For a fuller treatment of the probability 
and statistical concepts used here, the reader is referred to two excellent 
books: W. Feller, An Introduction to Probability Theory and Its Applica- 
tions, Wiley, 1950; and A. M. Mood, Introduction to the Theory of Sta- 
tistics, McGraw-Hill, 1950. 

We are indebted to the Office of Naval Research, for its continuing 
support which made it possible for us to write this book. We are also 
indebted to our former colleagues at the Rand Corporation, countless 
discussions with whom helped to clarify our ideas, to Herman Rubin and 
Patrick Suppes, whose participation amounts practically to co-author- 
ship, to J. C. C. McKinsey for numerous improvements, particularly in 
Chapters 1 and 2, to Oscar Wesler for introducing a great deal of clarity 
and rigor in the final revision of this book, to Rosedith Sitgreaves and 
Russell N. Bradt for substantial contributions, to Gladys Garabedian 
for preparing the figures for the draftsman, and to many other people 
for assistance and criticisms. We are grateful to Phyllis Winkler, who 
typed two versions of this book, performing 1-1 transformations of nota- 
tion and deciphering cryptic instructions from the authors, without ever 
losing her sunny disposition. 

Davip BLACKWELL 


M. A. Girsuick 
February 1954 
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CHAPTER 1 


Games in Normal Form 


1.1. Introduction 


A game is characterized by a set of rules having a certain formal struc- 
ture, governing the behavior of certain individuals or groups, the players. 
Chess and bridge are examples of games in the sense considered here, 
and will be used for illustrative purposes. 

Broadly speaking, the rules provide that the game shall consist of a 
finite sequence of moves in a specified order, and the nature of each 
move is prescribed. Moves are of two kinds, personal moves and chance 
moves. A personal move is a choice by one of the players of one of a 
specified, possibly infinite, set of alternatives; for instance, each move 
in chess is a personal move; the first move is a choice by White of 1 of 
20 specified alternatives. The actual decision made in a particular 
play of a game at a given personal move we shall call the choice at that 
move. A chance move also results in the choice of one of a specified 
sct of alternatives; here the alternative is selected not by one of the 
players, but by a chance mechanism, with the probabilities with which 
the mechanism selects the various alternatives specified by the rules of 
the game. For instance, the first move in bridge consists of dealing 
the first card to a specified player. This is a chance move with 52 al- 
ternatives; the rules require that each alternative shall have proba- 
bility 145 of being selected. The actual selection made in a particular 
play of a game at a given chance move we shall call the outcome at that 
move. 

In terms of moves, the rules of a game have the following structure. 
For the first move, the rules specify whether it is to be a personal move 
or a chance move. If it is a personal move, the rules list the available 
alternatives and specify which player is to make the choice; if it is a 
chance move, the rules list the available alternatives and specify the 
probabilities with which they are to be selected. For moves after the 
first, say the kth move with k > 1, the rules specify, as a function of 
the choices and outcomes at the first k — 1 moves (a) whether the kth 


move is to be a personal move or a chance move, (b) if a chance move 
$ 
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the alternatives and their probabilities of selection, and (c) if a personal 
move, the alternatives, the player who is to make the choice, and the 
information concerning the choices and outcomes at the first k — 1 
moves that is given to him before he makes his choice. Finally, the 
rules specify, as a function of the choices and outcomes at the succes- 
sive moves, when the game shall terminate and the score, not neces- 
sarily numerical, that is to be assigned to each player. 

Various points should be noted in connection with the above descrip- 
tion. (1) For k > 1, the characteristics of the kth move depend on the 


Figure 1 


results of previous moves. In bridge, for instance, when the first three 
bids are “pass,” the move after the fourth bid will be a chance move, 
i.e., a new deal, if the fourth bid is “pass,” and a personal move, i.e., 
a bid by the next player, if the fourth bid is anything other than “pass.” 
(2) The information given to a player when he is to make a certain move 
does not necessarily include the information that was given to him at 
an earlier move. In bridge, for instance, two partners are best consid- 
ered as a single player; each partner has complete knowledge of his own 
hand throughout the play, but not of his partner’s hand. (3) The num- 
ber of moves occurring in an actual play is not fixed in advance but de- 
pends on the successive outcomes and choices. The rules must guaran- 
tee, however, that the game will terminate eventually. (4) The con- 
cept of move is to some extent indeterminate. For instance, the deal 
in bridge may be considered either as a single chance move or as a suc- 
cession of 52 chance moves. 

An interesting graphical representation of a game can be given in 
terms of a tree as is illustrated in Figure 1. In this figure the vertices 
represent various positions for the players as the result of the outcomes 
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of a sequence of moves. No vertex is occupied by more than one player. 
Play begins at a specified vertex O, and each move is a change in posi- 
tion from a vertex to one of the adjoining higher vertices. When any 
vert®x without branches is reached, the play of the game is over. Each 
non-terminal vertex has a label (in Figure 1, the labels I and II are for 
players I and II, respectively), specifying who is to move if that vertex 
is reached, and the number of alternatives is simply the number of 
branches at the vertex. The information available to the players is 
specified by a partition $ of the vertices into sets V, called information 
sets, such that, when a vertex ve V is reached, the player who is to 
move is told only the set V, and not, unless V contains only a single 
element, the exact position v. Certain obvious requirements are im- 
posed on the information partition $: (1) all ve V must have the same 
mover and the same number of alternatives, and (2) if vı e V and vəs is 
attainable from vı, then v2 V. In the game given by Figure 1, the 
information partition $ contains five information sets as members, and 
three of these, V1, V3, Vs, are sets containing only one element. When 
every information set V of a game contains only one element, the game 
is said to be a game of perfect information. This special class of games 
is discussed in Section 1.7. 

A formal mathematical description of the structure of a game in 
terms of the concepts of moves and information will not be attempted 
here. One of the most important results in the theory of games is the 
fact that all games, no matter what their detailed formal structure, 
may be reduced to an exceedingly simple form, the so-called normal 
form. In the next two sections we give an informal description of how 


this reduction is effected. 


1.2. The Concept of a Strategy 


Imagine that you are to play the White pieces in a single game of 
chess, and that you discover you are unable to be present for the occa- 
sion. There is available a deputy, who will represent you on the occa- 
sion, and who will carry out your instructions exactly, but who is ab- 
solutely unable to make any decisions of his own volition. Thus, in 
order to guarantee that your deputy will be able to conduct the White 
pieces throughout the game, your instructions to him must envisage 
every possible circumstance in which he may be required to move, and 
must specify, for each such circumstance, what his choice is to be. 
Any such complete set of instructions constitutes what we shall call a 
strategy. Thus, a strategy for White must specify a first move and, 
for each possible reply by Black, a corresponding next move and, in 
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general, for each possible sequence of choices my, «+ -, Max, k > 0, a choice 
c(m, +++, mex) for his (k + 1)st move (the [2k + 1]st move of the 
game) among the alternatives available to him as a result of the situa- 
tion Mı, ***, Mz. Of course, the specification of a strategy that has a 
reasonable chance of winning against an opponent who will actually 
be present would be a formidable job, mainly because you would have 
to make a tremendous number of decisions, most of which would turn 
out to be irrelevant in any particular play of the game, since most of 
the situations for which decisions had to be made will simply not turn 
up. For instance, a White player who is actually present must make 
two decisions in order to make his first two moves, whereas one who is 
operating by deputy must make 21 decisions to guarantee that his 
deputy will be able to make two moves. Thus the use of strategies 
would make the actual play of a single chess game a major undertak- 
ing; nevertheless, we shall see that the concept of strategy leads to 
great simplification for theoretical purposes. 

In general, a strategy for a given player in a given game consists of a 
specification, for each situation that could arise in which he would be 
required to make a personal move, of his choice in that situation. It 
is to be emphasized that the choice of a strategy imposes no theoretical 
handicap on a player; specifically he is not required to make any deci- 
sion on the basis of less information than he would have in the actual 
play at the stage where this decision was required, since the decisions 
whose totality constitutes a strategy are conditional ones of the form 
“Tf a situation should arise in which I am informed that it is my move 
and I am given information z about the results of previous moves 
(chance and personal) and I am told that the set of alternatives open 
to me is S, my choice will be s (a point of S).” Properly speaking, the 
information z about the results of previous moves includes the informa- 
tion that it is his turn to move and the specification of the alternatives 
open to him, since these latter facts in general constitute information 
about the results of previous moves. With the understanding that in- 
formation z includes specification of S and of the fact that it is his turn 
to move, we may describe a player’s strategy as a listing of all possible 
z’s and an assignment to each z of an s in the set S of available alterna- 
tives specified by z, i.e., if Z is the space of possible z’s (in terms of the 
tree, the space of all information sets V with the player’s label) and 
S(z) is the set of alternatives specified by z (the set of branches leading 
out of a vertex of the particular information set), the player’s strategy 
is a function f such that f(z) € S(z) for all z e Z (i.e., f is a rule that speci- 
fies, for each information set, which branch is to be selected). The 
space of all possible strategies f for a given player will be designated by 
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n 

F. (The number of elements in F is clearly given by [I Ti where n is the 
izi 

number of information sets with the player’s label and r; is the number 

of alternatives for the ith information set.) 

Let Zı, Zə be the Z spaces for White and Black in chess, and let fı, 
fo be any strategies for White and Black respectively. A referee, given 
fi and fo, could conduct the entire game without further instructions 
from either player and announce the result. For the information z; 
that no previous moves have been made, that it is now White’s turn, 
and that S(z,) consists of the 20 legal opening moves for White is a 
point in Z, so that mı = fi(z:) e€ S(z1) is White’s opening move. The 
information zə that White’s opening move was mm, that no other moves 
have been made by either player, that it is now Black’s turn to move, 
and that S(z2) consists of the 20 legal moves for Black at this stage will 
be an element of Zə, so that fo(z2) = mə will be Black’s reply, ete. 
Thus the outcome of the game is determined uniquely by the strategies 
fı, fe chosen by the players. 

The fact noted above for chess, that the outcome of the game is de- 
termined once each player has selected a strategy, will clearly continue 
to hold for any game in which each move is a personal move by some 
player, but will fail for bridge, for instance, where chance moves occur 
and influence the outcome. In discussing chance moves, we make the 
conceptual simplification corresponding to that of strategy for personal 
moves of a given player. Let W be the set of all possible cireumstances 
w in which a chance move would be required. Each w includes specifi- 
cation of the nature of the chance move to be performed, i.e., of the set 
S(w) of alternatives among which one is to be selected, and for each w 
of the probability distribution p over S(w) according to which the se- 
lection of se¢S(w) is to be made. Suppose that a referee performs in 
advance each chance experiment that may be required in the course of 
the game, i.e., for each w he selects an element s e S(w) according to the 
distribution p, the selections for different w’s being independent. Now 
a particular selection of an s from S(w) for each w constitutes a function 
h defined on W with h(w) eS(w). (In terms of the tree, h is a rule 
which specifies, for each information set labeled “chance move,” which 
branch is selected.) Also, the distributions p, together with the re- 
quirement of independent selection, determine an over-all distribution 
P on these functions h. Consequently, the totality of experiments per- 
formed by the referee may be viewed as a single experiment, namely, 
selecting a function he 3¢ according to the probability function Pp. 
Thus, just as the totality of all decisions to be made by a given player 
can be described by a single decision—the choice of a strategzgy—so can 
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the totality of chance experiments to be performed be replaced by a 
single over-all chance move. 

We can now describe quite simply the structure of a game. Suppose 
there are k players, and let F; be the space of possible strategies f; for 
player 7, 3€ the space of possible outcomes h of the over-all chance ex- 
periment, and G the space of (k-+1)-tuples g = (fı, +-+, fr, h), where 
fie Fa hex. Any (k+1)-tuple ge G determines uniquely the course 
of the play. For each play the rules of the game determine a unique 
outcome r belonging to the space of outcomes R, and consequently for 
each game there is a function g defined over the space G mapping G 
onto R; i.e., the range of q is the entire space R. For fixed f = (fi, 
-++, fx), the probability distribution P, according to which h is selected, 
determines a probability distribution Q over G, and thus a probability 
distribution zy over R. Hence, every game can be described by k 
spaces Fy, ---, Fy, a space R, and the association with each k-tuple 
(fi, +++, fe), Jie Fi, of a probability distribution zy over R. The game 
is played as follows: Player i chooses a point f; € F;, the k choices be- 
ing made simultaneously and independently; a point re R is then se- 
lected according to the probability distribution my associated with the 
k choices (fi, -++, fy). 

An analysis of the following simple game will provide an illustration 
of the concepts introduced in this section. Player I moves first and 
selects one of the two integers 1, 2. The referee then tosses a coin, and, 
if the outcome is “head,” he informs player II of player I’s choice, and 
not otherwise. Player II then moves and selects one of the two inte- 
gers 3, 4. The fourth move is again a chance move by the referee and 
consists of selecting one of the three integers 1, 2, 3 with respective 
probabilities 0.4, 0.2, 0.4. The numbers selected in the first, third, and 
fourth moves are then added, and that amount in dollars is paid by II 
to I if the sum is even, and by I to II if the sum is odd. 

The tree of this game is shown in Figure 2. The numbers at the ter- 
minal vertices represent the possible winnings in dollars of the two 
players. The symbol 0 is used to designate chance moves. 

In terms of the two spaces F, and F3, the strategies of the two players 
can be described as follows: The space Fı consists simply of the two 
single element sets (1) and (2). That is, 


Fı = {(), (2)} 


The space Fo consists of eight ordered triples (¢, j, k) with 7,7, k = 3, 
4; ie., 


Fo = {@, 3, 3), (3, 3, 4), (3, 4, 3), (8, 4, 4), (4, 3, 3), (4, 3, 4), 
(4, 4, 3), (4, 4, 4)} 
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where the first position in the triple is conditioned upon the coin falling 
head and player I choosing 1, the second position in the triple is condi- 
tioned upon the coin falling head and player I choosing 2, and the third 
position in the triple is conditioned upon the coin falling tail. The 
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Figure 2 


space JC and the values of the probability distribution P for each h e 5¢ 
are given by 


I: {(H,1), (H,2), (H,3), (T,1), (T,2), (T,3)} 
P: { 0.2 0.1 0.2 02 Ol 02} 


where the letter H stands for head and T for tail. The space G of all 
triples g = (fi, fe, h) clearly contains 96 elements. The space of out- 
comes Æ consists of the 5 elements $5.00, $6.00, $7.00, $8.00, $9.00. 
Typical values of the function g which maps G onto the space of out- 
comes is 


q(g) = $5.00 for g = ((1), (8, 3, 4), (H, 1)) 
a(g) = $8.00 for g= ((1), (4, 4, 3), (H, 3)) 
a(g) = $7.00 for g= ((2), (3, 4, 4), (T, 1)) 
a(g) = $9.00 for g= ((2), (4, 3, 4), (T, 3)) 
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These values are most easily computed by referring to Figure 2. The 
tree representation of the game also facilitates the computation of the 
probability distribution my on R for each f = (fı, fe). Thus, for ex- 
ample, if f = ((1), (3, 3, 4)), & 


77;($5.00) = my($8.00) = 0.2;  ~p($6.00) = ($7.00) = 0.3; 


a;($9.00) = 0 
and, if f = ((2), (3, 4, 4)), 


ay ($5.00) = m/($6.00) = 0; a ($7.00) = ms($9.00) = 0.4; 
($8.00) = 0.2 


PROBLEMS 


1.2.1. In the above illustrative example compute 7;(r) for each f and r, and pre- 
sent the results in tabular form. 

1.2.2, From the results of Problem 1.2.1, compute the expected gain for player I 
for each f, and present the results in a 2-by-8 matrix with the rows representing the 
strategies of player I and the columns the strategies of player II. 


1.2.3. Reduce the following game to the matrix form as in the previous problem, 
and sketch its tree: 


Move 1. Player I chooses i = 0 or 1. 

Move 2. A chance move, selecting j = 0 or 1 with equal probabilities. 

Move 3. Player II chooses k = 0 or 1. 

Outcome: If i +j +k =1, I pays II one unit; otherwise II pays I one unit. 

Information of II at move 3: II is told the value of j but not the value of i. 

1.2.4. Reduce to matrix form and sketch the tree for the variations of Problem 
1.2.3, with the same moves and outcomes, but with the following information for II 
at move 3. 

(1) II is told neither the value of 7 nor the value of j. 

(2) TI is told the values of both i and E i 

(3) TI is told the value of i + 

(4) II is told the value of the 
i and j). 


j (but not the separate values of i and A 
product of 7 and j (but not the separate values of 


1.3. The Normal Form 


So far we have not mentioned the motivations of the players—an 
essential element of any game. The description of a game at the end 
of Section 1.2 was simply a listing of the strategies of the players and 
the outcome of any set of choices of strategies, without regard to the 
attitudes of the players toward various outcomes. We now indicate 
briefly how the final simplification of a game—the normal form—is ob- 
tained, by taking into account the preferences of the players. 

We have seen in Section 1.2 that the result of any set of strategies 
fist frisa probability distribution ry over the set R of possil 


ble out- 
comes. It would be particularly convenient if a given player could ex- 
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press his preference pattern in R by a bounded numerical function u 
defined on R, such that he prefers rı to rə if and only if u(r1) > u(re) 
(so that u(rı) = u(r2) denotes indifference between 7; and r2) and such 
that, if for any probability distribution £ over R, we define U(é) as the 
expected value of u(r) computed with respect to $, i.e., 


U(é) = 2 E(r)u(r) 


he prefers £; to £ if and only if U(£1) > U(t). It is a remarkable fact 
that, under extremely plausible hypotheses concerning the preference 
pattern, such a function v exists. The function U defined for all proba- 
bility distributions ¢ over R, is called the player’s utility function. It 
is unique, for a given preference pattern, up to a linear transformation. 
We shall discuss this question in some detail in Chapter 4; for the 
present we shall simply suppose that each player has such a utility 
function. 

Thus the aim of each player in the game is to maximize his expected 
utility. If U; is the utility function of player z, his aim is to make 
Milfi, «++ fx) = Ui) as large as possible, where zy is the probability 
distribution (for fixed fı, +*+, fz) over R determined by the over-all 
chance move. We are now in a position to give a description of the 
normal form of a game: A game consists of k spaces Fy, ---, Fp and k 
bounded numerical functions M;(fı, «++, fe) defined on the space of all 
k-tuples (fi, «+, fe), fie Fa t= 1, +++, k. The game is played as fol- 
lows: Player í chooses an element f; of F; the k choices being made 
simultaneously and independently; player 7 then receives the amount 
Mah, -ees Jo, i= 1, a & The aim of player č is to make M; as 
large as possible. The statement “Player 7 then receives the amount 
Mlay +++ Sa)” is a shorthand way of saying “‘a situation results whose 
utility for player i is Mi(fi, «++, Je)?” We shall often speak as though 
the changes in utility are effected simply by money receipts or payments. 
This terminology is used because of its convenience and suggestiveness. 

In the theory of games it is usual to treat first a special class of games, 
the two-person zero-sum games. The theory of these games is par- 
ticularly simple and complete, and only these games will be treated in 
this book. A two-person game is of course a game with k = 2; a zero- 


sum game is one in which D Mah, +++) fe) = 0 for all fy, ++, fas 
j=l 


more precisely, since each M; is unique only up to linear transforma- 
tions, a game is zero-sum if there is a determination of My, ---, Mp for 
k 


which J Mi(fi, «++, fx) = 0 for all fi, +++, fe. Thus a two-person 


i=l 
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zero-sum game is a game between two players in which their interests 
are diametrically opposed: One player gains only at the expense of the 
other. Consequently, there is no motive for collusion between the 
players; it is precisely the fact that collusion is unprofitable that sim- 
plifies the theory. N 

Notice that a constant-sum game, i.e., one in which 2 Mi(fi, +++, fe) 


= c for all fi, ---, f is zero-sum in the sense defined above, since an 
alternative choice of utility functions is M*, = M, — c, M*; = M; for 
k 


i= 1, and >) M*;=0. Thus the theory developed below for zero- 
1 


sum two-person games applies equally to constant-sum two-person 
games. A slightly different kind of game falling within our definition 
of zero-sum games is illustrated by the following example. Player I has 
two strategies, 1 and 2, and player II has a single strategy 1*, If I 
chooses 1, nothing happens; if he chooses 2, he loses $1 and II wins 
$10. If we suppose that I prefers the result of 1 to that of 2 and that 


II has the opposite preference, an appropriate choice of utility functions 
yields 


Mi(l, 1*) = M:(1, 1*) =0, My (2, 1*) = -1, M2(2, 1*) = 1 


and the game is zero-sum. According to th 
rational play for I in this situation is to 
nothing happens. 

What often does happen in situations of this kind is that II approaches 
I, offering him a fee of, say, $5 if I will choose strategy 2. If I accepts 
this offer, the result is that I nets $4, II nets $5, and both have im- 
proved their positions. This means simply that our original descrip- 
tion of the game was incomplete, as we failed to list certain strategies, 
involving offering and acceptance of fees, which were in fact available 
to the players. A more complete description, taking these possibilities 
into account, would yield a non-zero-sum game. Thus it is important, 
in applying game theory to an actual situation, to be sure to list all 
available strategies for the players. 

Since for a two-person zero-sum game we have 


Melfi, f2) = —Mi(f,, fe) 


we need specify only M,. The term “game” will hereafter always 
mean “zero-sum two-person game.” Thus we have 

Definition 1.3.1. A game G in the normal form is a triple (X, Y, M Dy 
where X, Y are arbitrary spaces and M is a bounded numerical function 


e theory developed below, 
choose strategy 1, so that 
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defined on the product space X X Y of pairs (zx, y), reX,yeY. The 
points x (y) are called strategies for player I (II), and the function M 
is called the payoff. 


The game G is played as follows: I chooses x e X, II chooses y e Y, 
the choices being made independently and simultaneously. II then 
pays I the amount M (x, y). 

Note that, although it was useful, for the understanding of the argu- 
ment leading to the simplification of a game to the normal form, to 
restrict the spaces X and Y to be finite, this restriction is removed in 
Definition 1.3.1 and will not be assumed, unless specified otherwise, in 
the remainder of this book. 


1.4. Equivalent Games 


If in a given game one relabels the strategies of either player, the 
new game is not essentially different from the old: Every statement 
about either game can be translated into a corresponding statement 
about the other, and we wish to consider the two games as equivalent. 
Another simple transformation which does not alter the essential char- 
acter of a game is the deletion of duplicated strategies; in other words, 
if player I has two strategies x, t2, such that M(x, y) = M(t, y) for 
all y, the deletion of x2 from X is an inessential change in the game, 
even though it might, for example, destroy such properties as sym- 
metry. 


Definition 1.4.1. Let Gi = (Xi, Yı, My) and Gs = (Xə, Yo, Mo) 
be two games. Then G% is a reduction of G, written: Gor Gi, if either 

(a) X2 = Xi, and there is a function f from Y, onto Yo, such that 
My (x, y) = Malz, f(y)) for all ze X, ye Yı, or 

(b) Yə = Yı, and there is a function g from X, onto Xe, such that 
My (a, y) = Me(g(z), y) for all xe Xy, ye F}. 


If f is a 1—1 transformation, Gə is obtained from G; by a relabeling 
of strategies; if f is not 1—1, G2 is obtained from G, by deletions of 
certain duplicated strategies and relabeling. 


Definition 1.4.2. Two games G and G” are called equivalent, written 
G ~ G', if there is a finite sequence of games Go, Gy, «++, Ga with Go 
= G, G, = G’, and, for each 7 = 1, ---, n, either Gi r G; or GirG 


fte 
An example of two equivalent games is the following. Let G = 
(X, Y, M) be a game, where X = (2, ---, zy), a set of N real numbers; 
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Y = (yı, ++, yr), also a set of R real numbers; and, for xe X, ye Y, 


Med =F 


Then G’ = (X’, Y’, M’), where x’ = az and y’ = by with a and b posi- 
tive constants, and 
ey’ 


eee Tete 


is equivalent to G. 


Definition 1.4.3. A game G = (X, Y, M) is called finite if both X and 
Y contain only a finite number of elements. 


A finite game G = (X, Y, M) with X = (z1, +++, am), Y = (yw, 
-++, Yn) is clearly equivalent to the game G’ = (Im, In, M’), where I, 
denotes the set of positive integers 1, ---, k, and M’(é, j) = M (i; y;). 
This equivalence leads to the following useful definition. 


Definition 1.4.4. If G = (X, Y, M) is a finite game with X = (2, 
++, Um) and Y = (y1, +--+, Yn), then the m X n matrix A = || ai; || 
with a;; = M(a;, yj) is called the matrix of the game G. 


We remark that most properties of games in which we shall be inter- 
ested are invariant under equivalence. By means of the transformation 
function f used in defining the concept of reduction, any statement 
about a game G can be translated into a statement about an equivalent 
game G”. Thus, although from a logical point of view a game G could 
be defined as a class of equivalent triples (X, Y, M), we shall not make 
this abstraction but shall consider each triple as constituting a game 
and, when convenient, replace any game by an equivalent one. A few 


theorems on invariance under equivalence are proved in later sections 
of this chapter. 


PROBLEMS 


1.4.1. Two games with matrices A and B are equivalent if B is obtained from A 
by a permutation of rows or columns. 

1.4.2. If Xı C X, a sufficient condition that the game Gi = (Xj, Y, My) with 
Miz, y) = M(z, y) for ce Xı, ye Y be equivalent to G = (X, Y, M) is that for 
every xe X there is an zı e X; with M(x, y) = M(z, y). 

1.4.3. Call G = (X, Y, M) completely reduced if 11 # x2 implies M(x, y) ¥ M (za, y) 
and yı # yz implies M(x, yı) # M(z, y). Every game is equivalent to a completely 
reduced game. (First reduce the game by deleting all duplicated strategies in X; 


then delete all duplicated strategies in Y. The resulting game is completely re- 
duced.) 
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1.4.4. Two games Gi and G2 are called isomorphic if there exist 1-1 mappings f 
and g of Xı onto Xə and Yı onto Yə respectively, such that Mi(x1, yı) = Mo(f(e), 
g(y1)) for all zı £ Xj, ye Yı. Two games are equivalent if and only if their com- 
pletely reduced forms are isomorphic. 

1.45. If G' ~G and G’ is reduced, there is a game G such that Gi r G and G” r Gi. 

1.4.6. Problems 1.4.3 and 1.4.5 imply that the n of Definition 1.4.2 need never 
exceed 4. Give an example where n must be at least 4. 

1.4.7. Show that the condition of Problem 1.4.2 is also necessary if either X or F, 
when duplicated strategies are deleted, is finite. 


1.5. Illustrative Examples 


To illustrate the concepts of normal form and equivalence we give 
several examples. 

Game G;. Matching Pennies. Players I and II simultaneously place 
coins on a table. If the coins agree, i.e., both show heads or both tails, 
II pays I one unit. If not, I pays II one unit. 

Clearly each player has two strategies—heads and tails. The game 


1 -l 
is equivalent to one with matrix ( 1 J 


Game Gə. Matching Pennies with Spying. This game is like match- 
ing pennies, except that I is required to place his coin first, and II is 
permitted to see the result before placing his own coin. 

I still has two strategies—heads and tails. A strategy for IT now 
specifies his choice when he sees heads and his choice when he sees tails, 
so that II has four strategies. Denoting heads by 1, tails by 2, and by 
(i, j) the strategy that chooses 7 when I chooses 1, and j when I chooses 
2, we obtain the matrix 


(1,1) (1,2) (2,1) (2,2) 
1 ( 1 1 =f Al ) 
2 —1 1 =i 1 
Game G;(p). Matching Pennies with Imperfect Spying. After I 
makes his choice, a coin is tossed that has probability p of showing I’s 
choice and 1 — p of showing the opposite. The result of the toss is 


revealed to II, who then makes his choice. 
Again I has two strategies and II has four. The matrix is 


GD (,2) (2,1) (2,2) 


al lL 2p=—1 1=—2p an 
2 =f 2p—1 L=2p 1 
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where (i, j) now denotes the strategy: II chooses 7 when 1 is announced, 
j when 2 is announced. 

Game G,(k, N). Addition. I and II alternately choose integers, each 
choice being one of the integers 1, ---, k, and each choice being made 
with knowledge of all preceding choices. As soon as the sum of the 
chosen integers exceeds N, the last player to choose pays his opponent 
one unit. 

The situation in which player I finds himself at his rth move is de- 


scribed by a sequence s, = (ii, i2, «++, 427-2), with each 7; being one 
of the integers 1, ---, k and 

2r—2 

LysNn 

j=l 


Denote by S, the set of possible sequences s,, where r = 1, --+,[N/2] + 1 
and [z] denotes the largest integer not exceeding z. A strategy x for I 
consists of a set of [N/2] + 1 functions fi, +++, fiw/2}41, Where f, is a 
function defined on S, assuming only the values 1, 2, ---, k: f, specifies 
I’s rth move when the previous history of the play is s,. Similarly, a 
strategy y for II is a set of [(N + 1)/2] functions gı, +++, Iw +1)/2)) 


where g, is defined for the set T, of all sequences t, = (41, +++, tor—1) 
with each 7; being one of the integers 1, 2, ---, k and 

2r—1 

DYAN 


j=1 
Define ù (x, y) = fı and, inductively for j > 0, 
ta;(x, y) = giù (x, y), +++, i2j—1(2, y)) 
tzj+1(T, Y) = fji (T, y), 4, to3(z, y)) 
(this induction describes the manner in which a referee would carry 


out the instructions of the players) and let j*(zx, y) be the largest j for 
which 7;(a, y) is defined. Then 


M(x,y)= 1 if j*(z,y) is even 
—1 if j*(z,y) is odd 


This game is again discussed in Section 1.7. 


PROBLEMS 


1.5.1. Is G3(1) equivalent to G2? Is G3(1/2) equivalent to G1? 

1.5.2. Let G4(k, N) be modified so that each choice is made with knowledge only of 
the sum of all previous choices. Is the modified game equivalent to the original 
game? 
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1.5.3. Let G = (X, Y, M) be any game. We modify G by introducing an irrelevant 
move as follows. Player I chooses an integer z = 1 or 2; his choice is announced to 
II. After this preliminary, the original G is played: x and y are chosen independ- 
ently, and II pays I the amount M(z, y). What is the normal form of the modified 
game? Is it equivalent to the original game? 


1.6. Lower and Upper Pure Value 
In a game G = (X, Y, M), the consequences of a strategy xo are de- 
scribed by the function M (zo, y). Using xo, player I is certain to re- 


ceive at least Agel) = inf M(2o, 1) 
yeY 


and cannot be certain of any definite larger amount (where for any set 
S of real numbers, inf S and sup S denote the largest number so with 
so < s for all se S and the smallest number sı with sı > s for all se S 
respectively). Thus the number 


A*a = sup Ag(z) 
zex 
is the upper limit to the amount I can guarantee getting: for every 


e > 0, he can, simply by choosing a suitable x, be certain of at least 
A*a — ¢, and for no e > 0 is there an x which makes him certain to re- 


ceive at least \*¢ + ¢ against all y. Similarly, we define 


* = 


To(yo) = sup M(z, yo), v*e = inf Te(y) 
zex yeY 


by selecting a y suitably, player II can with certainty restrict his loss 
to v*g + e but not to vřg — efor any « > 0. For subsequent reference 
these definitions are stated formally. 


Definition 1.6.1. If G@ = (X, Y, M) is a game, then, for zy eX, 
Ag(xo) = inf M(2o, Y). 
yeY 
Definition 1.6.2. If G = (X, Y, M) is a game, then, for ye F, 
Te(yo) = sup M(z, Yo). 
ze 
Definition 1.6.3. If G = (X, Y, M) is a game, then the lower pure 
value of G is the number 
A*g = sup Ac(x) = sup inf M(z, y) 
zex zex yeY 
Definition 1.6.4. If G = (X, Y, M) is a game, then the upper pure 
value of G is the number 


u*g = inf Te(y) = inf sup M(z, y) 
veY veY zeX 
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Theorem 1.6.1. If G = (X, Y, M) is a game, then, for every to ¢ X 
and yo £ Y, 

Ac(zo) < Tayo) and dA*g < v*g 

Proof. Ag(to) < M(zo, yo) < T(yo). Thus A*g < Tel(yo), for all 
Yo £ Y, and A*g < v*g. 

Consider now any game G. No method of play for I can guarantee 
him more than v*g, since II can restrict his loss to v*g, and no method 
of play for II can with certainty reduce his loss below \*g, since I can 
guarantee this amount. Thus, if A*¢ = vřg = v, say, no method of 
play can guarantee either player any improvement over v, and we have 
seen that each player can attain v (more precisely, approximate v as 
closely as he wishes). Thus, for such games, choosing an xo with Ag(xo) 
= v is an unimprovable method of play for I in the sense that no method 
of play can guarantee more, and similarly for II. This situation leads 
to the following definitions. 


Definition 1.6.5. IfG = (X, Y, M) isa game, and if \*g = v*g = vg, 
then the number vg is called the pure value of G. 


Definition 1.6.6. If G = (X, Y, M) is a game, and if vg is the pure 
value of G, then a good strategy for player I in G is any x) e€ X, with 
Ag(%o) = vg, and a good strategy for player II in G is any yo £ Y, with 
To(yo) = va. 


Theorem 1.6.2. If two games Gi = (Xj, Yı, Mı) and Gs = (Xo, Yo, 
Mo) are equivalent, then A*¢, = A*g, and v*g, = v*g,. 

Proof. It is sufficient to prove the theorem in the special case where 
one of the games is a reduction of the other. Suppose for definiteness 


that Gs is a reduction of G, and that f is a function mapping X, onto 
Xə. Since, for all z e X, and all ye Y, (=Y2), 


M(x, y) = M2(f(@), y) 
we have 
inf M(x, y) = inf M2(f(), y) 
ve Yi yve Y: 


Hence, for all x e X4, 


Aa,(2) = Ac,(f(x)) 
so that 


A*a, = sup Ag,(x) = sup Ag,(f(2)) = A*o, 
zeX, tex 


The proof that vřg, = v*g, is similar. 
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PROBLEMS 


1.6.1. If there are strategies zo, yo such that M(zo, y) = cı for all ye Y, M(x, yo) 
= co for all z e X, then cy = c2 = A*g = v*g. 

18.2. If there are strategies xo, yo and a number v such that M(x, y) > v > 
M(z, yo) for all z, y, then A*G = v*g = v, and 29, yo are good strategies for I, II. 

1.6.3. For any G = (X, Y, M) in which X, Y are closed bounded subsets of m- 
and n-dimensional Cartesian spaces and M is continuous on X X Y (i.e., M is uni- 
formly continuous on X X Y), Ag(z) and T¢(y) are continuous in z, y respectively. 

1.6.4. In any game G = (X, Y, M) in which Y is a closed bounded subset of n 
space and M is for each x continuous in y, there exists a y’ © Y such that Te(y’) = 
v*g. 

1.6.5. Show that, if G, and G2 are equivalent, and if a player has a good strategy 


in Gj, he also has a good strategy in Go. 
1.6.6. Find A*g and v*g for matching pennies, matching pennies with spying, and 


matching pennies with imperfect spying. 

1.6.7. I chooses a number z, 0 <x <1. II consists of two partners IT; and IIo. 
II; observes z, then chooses a number z, 0 < z < 1, and z, but not x, is communi- 
cated to IIe. Ile then chooses a number y, 0 < y <1, and II pays I the amount 
z+ | ay j Describe this game in normal form and evaluate \*g, v*g. Does 


either player I or II have good strategies? 

1.6.8. For any game G = (X, Y, M) let T(G) be the game played as follows: 
I chooses ze X and this x is announced to II. II then chooses ye Y and pays I 
the amount M(z, y). Describe T(G) in normal form. Show that A*7(g) = v*7@) 


= \*G. 
169. Let T(G) be defined as in Problem 1.6.8, and let U(G) be the corresponding 
game in which II moves first. Is T(G) equivalent (a) to T(T(G)), (6) to U(T(G))? 


1.7. Perfect-Information Games 


We have seen that some very simple games, e.g., matching pennies, 
do not have a pure value. Among the games that do have a pure value 
are the so-called perfect-information games of which chess, checkers, 
and tic-tac-toe are examples. In this section we shall consider such 
games in some detail. We remark, however, that the theorems and con- 
cepts discussed here, though interesting, are not needed for the under- 
standing of the remainder of the book, and may be skipped by the 
reader. , SON 

Essentially, a game of perfect information is one that can be de- 
scribed in terms of successive moves in such a way that, at each per- 
sonal move, the mover knows the choices and outcomes of all preced- 
ing personal and chance moves. In terms of the description of games 
in extensive form in Section 1.2, perfect-information games may be 
characterized as games in which every information set is a unit set. 
It is intuitively clear that this condition is equivalent to the require- 
ment that every branch of the tree of the game also be a tree of some 
game. This latter condition leads to an inductive definition for games 
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in normal form. In this definition, the order of a perfect-information 
game intuitively corresponds to the maximum number of moves in that 
game. 


Definition 1.7.1. A game G = (X, Y, M) is a perfect-information 
game of order 0, if and only if M(z, y) is constant; and a game G = 
(X, Y, M) is a perfect-information game of order n + 1, if and only if 
there is a set A and a class G4 of games Ga = (Xa, Ya, Ma) for ae A, 
such that each G, is a perfect-information game of order n, and such 
that either 

Case 1. X consists of all pairs z = (a, z) with ae A, ze Xa, Y con- 
sists of all functions y defined on A with y(a) e Yq for all a, and 


M((@, 2), y) = Malz, y(@)) or 


Case 2. Y consists of all pairs y = (a, z) with ae A, ze Ya, X con- 
sists of all functions x defined on A with x(a) e Xa for all a, and 


M(x, (a, z)) = Ma(x(a), z) or 


Case 3. X,Y consist of all functions z, y defined on A with x(a) e Xa, 
y(a) £ Ya for all a, and 


M(x, y) = 2 P(a)Ma(e(a), y(a)) 


where p(a) > 0, >> p(a) = 1. A game G is called a perfect-information 
aed 
game if G is a perfect-information game of order n for some n. 


Our inductive description corresponds to the fact that the result of 
the first move in a perfect-information game with n moves is a perfect- 
information game with n — 1 moves, so that the first move can be con- 
sidered as a choice of one of a given collection of perfect-information 
games with n — 1 moves, with the three possible cases corresponding 
to the cases in which the first move—the choice of a—is a personal 
move of I, a personal move of II, or a chance move. For any perfect- 
information game G of order n, the first l moves of G may be considered 
as a game of perfect information, whose outcome is however not a num- 
ber, but a game of perfect information of order n — k, so that the first 
k moves of G may be regarded as a struggle to determine which game 
with n — k moves shall be played. Our inductive definition is the case 
k = 1. Reference to the tree and matrix of the game in Problem 1.7.1 


will make the previous discussion clearer and help in an understanding 
of what follows. 
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Theorem 1.7.1. Every perfect-information game has a pure value. 
Moreover, if G = (X, Y, M) is a perfect-information game of order 
n + 1, and Gy is the class of perfect-information games Ga = (Xa, Ya, 
M,)2ae A, of order n, as required by Definition 1.7.1, then, correspond- 
ing to the three cases for G4 in Definition 1.7.1, the pure value vg of G 
is given by either 


Case 1. vg = sup v¢(a) or 
aca 

Case 2. vg = inf vg(a) or 
acA 

Case 3. ve = 2, pla)vela) 
aed 


where v¢(a) is the pure value of Ga. In addition, if there is a be A, 
such that vg(b) = ve, and, for every ae A, there are good strategies 
z*, € Xa, Y*a € Ya in Go, then good strategies x* e X, y* e Y exist in G 
and, corresponding to the three cases in Definition 1.7.1, are given by 
either 


Case 1. a* = (b, £*), y*(a) = y*a forall aeA 


(b, y*s), t*(a) = 2%, forall aed 


Case 2. yf 
Case 8. 2x*(a) = t*a y*(a) = y*a forall aed 
Proof. The theorem is obvious for n = 0, and we may suppose the 


theorem true for all perfect-information games of order less than n + 1. 
Case 1. Letv = sup vę(a). For any «> 0, choose x* = (b, z) and 
aeA 


y* such that 
vg (b) >uv-e 
Ac) > va(b) — € 
ea(y*(a)) < vala) + € 
Then, for all ye Y, 
(1) M(e*, y) = Mole, y(a)) 2 Aas) > v — 2e 
and, for all ze X, 
(2) M(a, y*) = Malta, y*(a@)) < Te.Y*@) < vala) te Sv te 
Thus G has pure value vg = v. Furthermore, if the supremum of v¢(a) 
is attained by some b e A and there are good strategies t*a € Xa, Y*a € Ya 


for every Ga, then, since equations 1 and 2 are valid for all e > 0, the 
choices z* e X, y* e Y are good strategies in G. The proofs in cases? 


and 3 are similar. 


20 GAMES IN NORMAL FORM Ch. 1 


The inductive description of the value and good strategies in perfect- 
information games can be used to solve such games. We illustrate the 
method by solving the game addition, denoted by G(k, N) and de- 
scribed in Section 1.5. If we denote by G’(k, N) the game addition in 
which player II moves first, we notice that the initial choice of a = 1, 
-++, k by I produces a situation equivalent to the game G’(k, N — a), 
and similarly for initial choices by II in G’(k, N), where G(k, N) and 
G'(k, N) are of order 0 for N <0. Thus G(k, N) and G@’(k, N) are 
perfect-information games of order N for N > 0. Denoting by v(k, N) 
the pure value of G(k, N), we have 


v(k, 0) = —1 
v(k, N) = 1 for N=1,-++-,k 
Since at each stage the set A contains the k integers 1, ---, k, we ob- 


tain, for N > 0, 
v(k, N) = max (v'(k, N — 1), +--+, v'(k, N — k)) 


and, since the value v'(k, N) = —v(k, N) for every integer N, we ob- 
tain inductively 


v(k, 1) = max (1, —1, -1, =, =s 
v(k, 2) = max (—1, 1, —1, +, —1)= 1 


v(k, k) = max (—1, —1, Shire, —l, l= 1 
o(k, k + 1) = max (—1, =1, -1,-+-, -1, -1) = -1 
o(k, k + 2) = max (1, —1, -1,---,-1) =1 

ete. 


Thus, for N > 0, o(k, N) = —1 if Nisa multiple of k + 1 and v(k, N) 
= 1 otherwise. If N is not a multiple of k + 1, the initial move in a 
good strategy for I is to choose a so that N — a is a multiple of k + 1, 
since this a maximizes —v(k, N — a); if N is a multiple of k + 1, all 
initial moves are equally worthless, 
If a finite game G = (X, Y, M) with matrix A has a pure value vg, 
both players will have good strategies, say, ae X , Yje Y. Thus, in 
accordance with Definition 1.4.4, vG = aix», and Í ' 


Qije L Qir L aix, for all i 


We may restate this inequality as follows: ixj» is the minimum element 
in its row and the maximum element in its column. Such 


i a t 
is called a saddle point of the matrix. an elemen 


Conversely, if a matrix A has a 
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saddle point a;s», then the game G = (X, Y, M) with matrix A has 
pure value vg = ajs, and xe X and yje Y are good strategies for 
players I and II respectively. Using Theorem 1.7.1, we obtain the fol- 
lowitfg result for perfect-information games. 


Theorem 1.7.2. The matrix of a finite game of perfect information 
has a saddle point. 


We remark, however, that by no means all matrices with saddle 
points correspond to perfect-information games; the matrices of per- 
fect-information games have a further property which we shall describe 
below. 

We begin with an example. Suppose that, as in G(2, 2), players I 
and II alternately choose integers, each choice being 1 or 2 and each 
made with knowledge of all preceding choices, until the sum of the in- 
tegers chosen exceeds 2. (The reader is advised to draw the tree for 
this game.) Strategies for both players may be described by listing 
those situations in which the player chooses the number 1; there are 
clearly four strategies for each player. For player I they are [1]: never 
choose 1; [2]: choose 1 only at first move; [3]: choose 1 only at second 
move; (4): always choose 1. For player II they are [1]: never choose 1; 
[2]: choose 1 only if I chose 1 on first move; [3]: choose 1 only if I chose 
2 on first move; [4]: always choose 1. The possible plays are (2, 2), 
(2, 1), (1, 2), (1, 1, 2), (1, 1, 1), and if the payoff to I for these plays is 
A, B, C, D, E, we have the matrix 


X 1) 2) Bl H S.C.E R T., Wes: Benga 
Dats 2.4.66... 

Hla A B B pon 

ae DaD Acc. } 04252... piis 

BI] 4 A B B 

wle H C B 


Theorem 1.7.2 asserts that, no matter what values are assigned to A, 
B, C, D, E, the resulting matrix will have a saddle point. For instance, 
the values A=2,B=4,C=1,D=0, E = 3, yield, after deleting 


row [3] as a duplication of row [1], the matrix 


Hy g B a 


I 
wlz 2 4 4 
gp] 1 0 1 0 
wili 3 1 8 
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which has the upper left element as a saddle point. Now instead of 
looking for saddle points, one could proceed as follows. A comparison 
of rows [2] and [4] shows that, no matter what II chooses, strategy [4] 
is at least as good as strategy [2], so that I may delete row [2] from con- 
sideration, yielding 


So a a a 


me 2 4 4 
[4] 1 3 1 3 
A comparison of the columns of this matrix shows that, against either 


of I’s remaining strategies, strategy [1] for II is at least as good as [2], 
[3], or [4], so that II may delete [2], [3], and [4] from consideration, 


leaving st 
IN u] 
[1] 2 
[4] 1 


Finally, against II’s strategy [1], I should play [1], yielding 


Wa 


u] 2 


Thus, not only is there a saddle point, but also we can discover it by 
successive deletion of “inferior” strategies. This is the property that 


all matrices of games of perfect information possess, as we shall show. 
On the other hand, the matrix 


23 4 
1 4 0 
1 0 6 


A : saddle point > the upper left corner, but no deletions of inferior 
strategies are possible, so that it cannot be thi ix fect- 
information game. ? RE RR 

f Theorem 1.7.3. If G = 
tion, the matrix of G may 
deletion of inferior (includi 


(X, Y, M) is a finite game of perfect informa- 
be reduced to a single element by successive 
ng duplicated) strategies. 
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Proof. If G is of order 0, the theorem is clear. Suppose the theorem 
true for all finite perfect-information games of order k < n, and let G 
be of order n. We consider the three cases of Definition 1.7.1. 

Cas® 1. (Player I makes the first move.) Let A = (1, +--+, 7), and 
let Bı, ---, B, be the matrices of Gy, ---, G,, where matrix Bz has ele- 
ments b(a, i, j), 1<i<ma,1<j< Nna. With the set S = (Bi, +++, 
B,) we associate a matrix T(S) whose rows are in 1—1 correspondence 
with the set of all couples (a, i), where ae A and 1 < i < ma; whose 
columns are in 1—1 correspondence with the set of all r-tuples (ji, 
“+, Jr), where 1 < ja < Na for 1 <a < r; and which is such that, if 
I corresponds to (a, i) and J to (jı, ---, j+), then the element try in the 
Ith row and Jth column of T is given by the formula 


trs = b(a, 4, ja) 


Then T(S) is the matrix of G, and the following assertions are easily 
verified: 

1. If i* is an inferior row in B,*, then J* = (a*, 2*) is an inferior row 
in T(S), and, if S’ is obtained from S by deleting row 7* from Ba», then 
T(S’) is obtained from T(S) by deleting row I*. 

2. If j* is an inferior column in B,s, then all columns J with jẹ» = j* 


are inferior columns in T(S), and, if S’ is obtained from S by deleting 


column j* from Bas, then T(S’) is obtained from T(S) by deleting all 
columns with ja» = j*. 

Thus assertion 1 associates with each permissible row deletion in any 
B a corresponding permissible row deletion in T, and assertion 2 asso- 
ciates with each permissible column deletion in any B a corresponding 
Set of permissible column deletions in 7. The induction hypothesis as- 
serts that there is a series of permissible deletions reducing S to a col- 
lection S* of r one-element matrices. The corresponding permissible 
deletions reduce T(S) to T(S*), which has only a single column, Clearly 
T(S*) can be further reduced to a single element—the maximum ele- 
ment of 7'(S*). 

Case 2. The proof in case 2 (player II makes the first move) is 
similar. 

Case 8. (The first move is a chance move.) Let A = (1, «++, 7), 
and, with the same notation as in case 1, let By, -- -, B, be the corre- 
Sponding matrices. Let p(a) be the probability with which a is se- 
lected; we associate with S the matrix U(S) whose rows are specified by 
r-tuples I = (i, «++, i+), 1 < ia < Ma, whose columns are specified by 
tuples J = (Jis +++) Jr), 1 S Ja L Na, with 


Urs = >, p(a)b(a, tia, ja) 
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The following assertion is easily verified: If i* is an inferior row in Bon 
then all rows in U for which i,» = 7* are inferior rows in U, and, if Ss 
is obtained from S by deleting 7* from Ba», U(S’) is obtained from U(S) 
by deleting from U(S) all rows with ią = 7*; similar facts hoid for 
columns. ; ; 

Thus, as in case 1, each permissible deletion in any B is associated 
with a corresponding set of permissible deletions in U; the induction 
hypothesis yields a series of deletions reducing S to a collection S* of 
one-element matrices, and the corresponding series reduces T(S) to 
T(S*), which is itself a one-element matrix. This completes the 
proof. ; 

Corollary. Let G = (X, Y, M) be a finite game of perfect informa- 
tion, and let M (£o, yo) be the single element to which the matrix of G 
reduces; then G has pure value vg = M (zo, yo) and Xo, yo are good strat- 
egies for players I, IT. 

Proof. The corollary follows from the fact that, if a matrix A, is 
obtained from A by deletion of an inferior strategy, any saddle point 
in A, is also a saddle point in A. Since the final single-element matrix 
M (zo, yo) has a saddle point at (xo, yo), (Xo, Yo) is also a saddle point 
of the original M (z, y). 

We remark that the proof and corollary of Theorem 1.7.3 yield, for 
finite games, a second proof of Theorem 1.7.2. 


PROBLEMS 


1.7.1. Draw the tree for the following game: An unbiased coin is tossed, and, if 
the outcome is “head,” player I chooses one of the two integers 1,6. If the outcome 
is “tail,” he chooses one of the two integers 2, 7. Player II then chooses one of the 
two integers 3, 9. The choices of the two players are added to yield the sum s. 
A coin is then tossed which has 0.8 probability of falling head and 0.2 probability 
of falling tail. If the outcome of the toss is head, II pays I s dollars; if the outcome 
is tail, I pays II s dollars. The outcome and choices at each move are known to 
both players. 

1.7.2. Compute the payoff matrix of the above game, and indicate which strate- 


gies for the players result in a saddle point. Show that the matrix may be reduced 
to a single element by successive deletion of inferior strategies, 


1.8. Mixed Strategies 


In matching pennies, we have v*ç = 1; either of II’s strategies has a 
counter by I which wins for I every time. In matching pennies with 
the extremely imperfect spying represented by p = 1/2 on the other 
hand, the matrix is 
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(1,1) €,2) (2,1) (2,2) 
1 ( 1 0 0 =f 
æ 2 =] 0 0 1 ) 

and u*g = 0: II can reduce his expected loss to zero simply by believ- 
ing that the face shown does represent I’s choice, i.e., choosing strateg; 
(2, 1), or by disbelieving, i.e., choosing (1,2). Notice that, since p =1/2, 
the coin that II observes could as well have been tossed in ignorance of 
Ts choice: II can achieve the same effect in the original matching- 
pennies game simply by tossing a fair coin and believing (or acting as 
if he believed) that the result represents I’s choice. 

Suppose, then, that, in matching pennies, II tosses a coin and bases 
his choice on the result of the toss. We may regard the coin toss either 
as a device employed by II for choosing a strategy in the original game 
or as a new chance move introduced into the game by II, whose result 
is observed only by him, and which has no influence on the payoff, 
except in so far as II bases his play on its result. We shall adopt the 
second point of view and extend every game by introducing four new 
moves at the beginning of the game: a personal move by player I— 
the selection of a chance device—and a chance move performed by the 
device selected by I, whose result is announced to him, and the similar 
moves for II. 

A strategy for player I consists of (1) choosing a chance device and 
(2) specifying for every outcome of the chance device selected in (1), a 
Strategy x in the original game. We have not yet specified the class of 
chance devices available to I; we shall restrict attention to chance de- 
vices with a countable (which we take to include finite) number of pos- 
sible outcomes and shall suppose that, for any sequence 


= (1), £2), <5),  &@) 20, 2) $1 


player I has available a chance device with different, possible outcomes 
Ti, 2, +++, with the probabilities (1), £(2), +-+. Part 2 of I’s strategy 
then assigns to each r; a corresponding x;¢ X. Now the exact nature 
of ry, 72, +++, is irrelevant except that they must be distinguishable, 


and I’s choice of ¢ = (E(1), (2), ---,) and (a, z2, ---,) is in effect a 


decision to choose tı, tz, «++, With probabilities £(1), (2), +--+. Thus 
a permutation of (£(1), §(2), + + +,) associated with the same permutation 
of (xı, xə, +--+) is in effect the same strategy as the original. Finally, 
we may suppose that 21, 22, -+ +, are distinct, since, for instance, (1/2, 


1/4, 1/4), (æ, x’, z”) has the same result as (3/4, 1/4), (2, 2”). A 
Strategy for I becomes, then, simply a probability distribution — over 
X, where by a probability distribution £ over X we mean the following. 
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Definition 1.8.1. A (discrete) probability distribution over a space 
X is a non-negative, numerical function ¢ defined over X such o 
(x) = 0 except for a countable number of 2’s in X and Ps t(x) = 


Similarly, a strategy for II is a probability disioi z — F, 


Definition 1.8.2. Let G = (X, Y, M) be a game. Then the game 
T = (Z, H, M), where =, H consist of all probability distributions &, n 
over X, Y, and 


MGE, n) = È M (z, y)E(@)n@) 


is called the mixed extension of G. 


We shall often use the same symbol M for the payoff in G and T. 
This is to emphasize the fact that T is an extension of G. The strategy 
& with (zo) = 1, (x) = 0 for z ¥ Zp we shall denote simply by zo. 
Thus the symbol M(x, y) denotes either the value of the original M 
at the point (x, y) of X X Y or the value of M at the point (t, n) of 
= X H, where (x) = 1 and n(y) = 1. Of course, the value is the same 
with either interpretation since we are describing the same thing: the 
payoff to I when he chooses x e X and II chooses y e Y. 


Definition 1.8.3. Let T = (=, H, M) be the mixed extension of the 
game G = (X, Y, M). Then the points xe X, ye Y are called pure 
strategies in G sind the points ¢ £ £, n e H mized strategies in G. 


Theorem 1.8.1. If G = (X, Y, M) is a game and T = (Z, H, M) is 
the mixed extension of G, then 


(a) Ay(é) = inf M(t, y) and A*e < A*r 
ye 
(b) Tr(a) = sup M(e, n) and v*g > v*p 
Proof. (a) For any n£ H, 


MGE, 1) = 2) MG, y)n(y) > inf MG, y) 
Y yeY 
Thus 


Ar(é) > int MG, y) 
yeY 
Also, since H contains every y e Y, 
inf M >i = 
ind MG, y) = mi, MG, 7) = Ar(é) 


and the first part of the theorem follows. Furthermore, for every xe X 
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Ag(x) = Ap(x) < A*r 


so that \*¢ < A*r. The proof of (b) is similar. 

Since \*¢ < A*r < v*p < v*g whenever G has a pure value, T will 
have the same pure value, and, since Ag(x) = Ap(x), any good strategy 
in G is a good strategy in T. The purpose of introducing T, however, 
was to analyze those games for which \*g < v*ę. We hope that for a 
large class of games it will turn out that A*p = vřr, as we would then 
have a reasonable theory for T and hence for G. 


Definition 1.8.4. Let G = (X, Y, M) be a game and let r = (=, H, 
M) be the mixed extension of G. Then, if r has a pure value vr, we 
shall say that G has value v = vp, and we shall call any good strategy 
in T a good strategy in G. 

Definition 1.8.5. Let G = (X, Y, M) be a game, and let T = (&, 
H, M) be the mixed extension of G. If there exists a £ ££ such that 
Ap(o) = A*r, then £o is called a maximin strategy for player I. If 
there exists an 79 e H such that Tp(m) = v*r, then vo is called a mini- 
max strategy for player II. 


Note that, if T has a value, then £o, 79 are good strategies for the two 
players. 

Tt is the basic theorem of the theory of games that every finite game 
has a value and both players have good strategies, so that, for finite 
games, the introduction of T provides a satisfactory theory. This theo- 
rem and some extensions will be proved in Section 2.3. 

Example. Matching Pennies. The matrix A of Gis A = E E 1) > 


T = (Z, H, M) where Z consists of elements £ = (a, 1 — a), H consists 
of elements n = (6, 1 — 8), 0 < a, B < 1, and 


MG, n) = a6 + (1 — a)(1 — 8) — a(1 — B) — (1 — a) 
= 1 — 2a — 28 + 4aß 


Thus 
Ar(Ẹ) = min [MẸ, 1), MG, 2)] = min [2a — 1, 1 — 2a] 
and 
A*r = Ar(1/2) = 0 
Similarly 
Tr(n) = max [28 — 1, 1 — 26] 
and 


v*r = T,(1/2) = 0 
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We see that matching pennies has value 0, and that each player has a 
good strategy—choosing heads and tails each with probability 1/2. 
We conclude this section with the following theorem. 


i 
Theorem 1.8.2. If two games are equivalent, so are their mixed ex- 
tensions. 


Proof. It is sufficient to prove the theorem in the special case where 
one of the two equivalent games Gi = (X, Y, Mı) and Gp = (X’, Y, 
Mp) is a reduction of the other. Suppose for definiteness that Gg is a 
reduction of G, and that f maps X onto X’. For every x'e X’ we de- 
fine Sy as the set of all ce X such that f(x) = 2’. Consider now the 
mixed extensions T, = (zı, H, Mı) and Tz = (2, H, Mg). For every 
x' e X’ and every £ e =, we define the function &*2 by 

Eaa) = DY A) 
ze Sr 
Then &*) €Z, and we define the mapping F from =; to Zə by setting 
F(t) = &2. Let g be a function defined over X’ such that, for every 
z' e X’, g(x')eS». Then, for every xe X and every f ez, we define 
the function &*; by 


* (x) = £o(x’) if there is an 2’ e X’ with g(a’) = x 
0 if there is no 2’ e X’ with g(x’) = x 


Then &*, © =, and F(é*,) = &. Thus, for every element $s © Ze, there 
is an element §*, © =, such that /(&*,) = &. In addition, for every 
£, eZ, and every n £ Hy 


Milf, 7) = È È Mila, vileo) 


zeX ye Y 


È DY Masa), ilen) 


zeX yeY 


DL Male’, Deilen) 


aeX’yeYreSp 


È È Male’, y*n) 


weX’yeY 


M2(F(E1), n) 
which completes the proof. 


ll 


ll 


PROBLEMS 


1.8.1. Let G be a game, and let T be i i i 
L ý its mixed extension. Sh ix 
extension of T is equivalent to T. in EO detail 
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1.8.2. Let G be a game whose payoff matrix is 


11270 
A= 
a 019 


Determine the value of G and the good mixed strategies for the players. [For any 
2 X n matrix (aij), i = 1, 2; j = 1, 2, ---, n, a good strategy for player I can most 
easily be obtained graphically by drawing the n lines 2; = aai; + (1 — a)azj, de- 
lineating for each æ, min z; (i.e., A(Ẹ)) which is a concave curve made up of line seg- 
ments, and then selecting a value of a for which this curve attains its maximum.] 

1.8.3. Solve the game of matching pennies with imperfect spying, i.e., find the 
value and good strategies. 

1.8.4. Let G = (X, Y, M) where X = (0 < x < 1), Y = (0 < y < 1) and M(z, y) 
=g — 2zy + 3y. Draw the graphs of A(z) and T(y), and locate on these graphs 
d* and v* respectively. 


CHAPTER 2 


Values and Optimal 


Strategies in Games 


2.1. Introduction 


We have proved in Chapter I that finite games with perfect informa- 
tion always have a value and each player has a good pure strategy. 
In this chapter we shall consider some general conditions under which 
games have values and the players have good (mixed) strategies. We 
shall here prove the basic theorem of the theory of games—that, for 
finite games, A*r = v*p, and that good mixed strategies exist for both 
players. This theorem can be considered as an assertion about convex 
sets in Cartesian n-dimensional space, and certain useful extensions of 
this theorem, which will also be given in this chapter, involve the asso- 
ciated concept of convex functions. To facilitate the understanding of 
these game-theoretic results, it will be necessary to introduce a sum- 
mary of the relevant facts about convex sets and convex functions. 
Games in which one of the players has a finite number of pure strategies 
are of particular interest in statistics and will be studied in some detail. 
Methods for solving games will also be considered. 


2.2. Convex Sets and Convex Functions 

We shall denote by S, the Cartesian n- 
tors s = (s1, +++, Sn), where each 8; is a real number; S, is also referred 
to as n space. (Henceforth, if we wish to emphasize that an element is 
or can be represented as a point in n space with n > 1, we shall desig- 


nate it by a letter in boldface type.) For any s = (s, naa Bp) Bide 
and any number }, As denotes the vecti 


dimensional space of all vec- 


or (Ası, +++, Asp), and, for any 
t= (4, -+-, t:)eS,, s+ t denotes the vector (sı + t, e ea Sn F ty); 
and s — t denotes s + (—1)t. The vector (0, 0, - 


-+, 0) will be denoted 
oints representable as 
d t, and the subset of 


by e. For any s, t with s = t, the set of all p 
As + (1 — à)t is called the line containing s an 
30 
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this line with 0 < à < 1 is the line segment joining s and t. For any s 


n 


and t, the number >> s;t; is called the inner product of s and t and will 


4 
be denoted by s-t. For any u # e and any constant c, the set of all 
vectors x with u-x = c is called a hyperplane. For any line land any 
point xo the hyperplane (s; — s2)-x = (Sı — S2)-Xo, where s; € l, soe i, 
Sı Æ So, is independent of sı, S2 and is called the hyperplane through xo 
perpendicular to l. If S and T are sets and u-x = c is a hyperplane in 
Sn such that, for x e S, u-x > c and, for xe T, u-x < c, then the hyper- 
plane u-x = c is said to separate S and T. For any u # e and any 
constant c, the set of all vectors x with u-x > c is called a half-space. 


n % 

The number (= 2%) is called the length of x, written |x|, and, for 
1 

any Xo and any r > 0, the set of all x with | X — Xo | <r (we denote 


this set by the symbol {x: | xX — Xo | <_r}) is called the sphere about xo 
of radius r. A point ue U is called an inner point of U if there is a 
sphere with center u which is a subset of U. A set U all of whose points 
are inner points is called open, and a set V for which the complementary 
set U of all points u ¢ V is open is called closed. For every set U, there 
is a smallest closed set containing U, called the closure of U and de- 
noted by U. A point s is called a boundary point of a set U if every 
sphere about s contains points ue U and points v g U. The closure of 
a set is obtained by adjoining to it all its boundary points not already 
in it. Since half-spaces are closed, any hyperplane separating S and T 
also separates their closures § and T. The diameter of a set R CS, is 
the least upper bound of | xı — X2 |, where x; and xz are in R. If ô is 
the diameter of R, then, for all xı, X2 €R, |x: —x2| < ô A set of 
finite diameter is called bounded. 

A set that has no points in it is called the null set and will be desig- 
nated by the symbol Ø. If S and T are any two sets then by the inter- 
section of S and T is meant the set of points common to both S and T, 
which will be designated by the symbol SM T. Thus, if S and T have 
no points in common, S N T = Ø. By the union of S and T is meant 
the set of all points in at least one of S and T, which will be designated 
by S U T. If S;, S2, «++ are a sequence of sets then we shall sometimes 
write () S; for Sı N S2 N ++ and US: for Sı U S2 U ~.. 


A subset S of Sn is called convex if for any two points sı and sz of S, 
the entire line segment joining Sı and sz is contained in S; i.e., As; + 
(1 — d)sp eS for 0 < à <1. Examples of convex sets are points, lines, 
line segments, spheres, hyperplanes, and half-spaces. For any set R, 
the set R* of all points r* which are centers of gravity of a finite set of 
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points in R with appropriate weights, i.e., which are all representable as 


r* = Mtr +++ Ante 
where 


k 
M20, DM=1, nek, i=l 
1 


is clearly a convex set containing R. Moreover, since 


" 
k+1 k Do Nt: 
i=1 
Mats = D n| Z | + Anges 
k 
și i=l iy 
int 
when 
k 
> i ~0 
i=l 


it is seen by induction on k that any convex set containing R also con- 
tains R*. Thus R* is the smallest convex set containing R. R* is called 
the convex hull of R. For example, the convex hull of two points is the 
line segment joining them; the convex hull of a circle is the circle and 
its interior. 

The following elementary properties of convex sets are easy to prove 
and are left as exercises at the end of the section: The intersection of any 
number of convex sets is convex. The closure of a convex set is convex. 
The set of inner points of a convex set is convex. (Note that a line seg- 
ment, considered as a subset of 2-space, does not have any inner points, 
but does have inner points when considered as a subset of the infinite 
line, the 1-space containing it. Thus, a point x belonging to a set S will 
be called a relative inner point of S if it is an inner point of S considered 
as a subset of the lowest dimensional space, that is, hyperplane, con- 


taining it. Similarly, for a relative boundary point.) A convex set 
always has (relative) inner points. 


Lemma 2.2.1. (a) Let S be a convex set, x an inner point of S, y a 
boundary point of S (belonging to S or not). Then the points (1 — A)x 
+ Ay are inner points of S for 0 << 1, and exterior points of S for 
X> 1. (b) A convex set S and its closure Š have the same boundary 
points, inner points, and exterior points. 


Proof. (a) Choose y, £ S such that Yn > y. Ifz = (1 —A)x +y, 
0 <A <1, is not an inner point of S, t 
Zn = zZ. Define x, by z, = (1 — A)Xn 
(1 — A). Then Xn ¥ S; otherwise, 


here is a sequence z, gS with 
e + Aya; 16: Xn = (Zn — AYn)/ 
since Yn €S, we would have z, eS. 
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Asn — œ, X, — (z — Ay)/(1 — A) = x, contradicting the hypothesis 
that x is an inner point of S. For) > 1, if there were an inner or bound- 
ary point of S on the line from x through y and beyond y, then the pre- 
ceding argument would make y an inner point of S, contradicting the 
assumption. (b) Since S C Š, an inner point of S is obviously an inner 
point of §, an exterior point of S is an exterior point of S. For an ex- 
terior point x of S there is by definition a sphere about x contain- 
ing no points of S, hence no limit point of S, consequently no points of 
Š, so that x is exterior to 5 as well; that is to say, in general, any set and 
its closure have the same exterior points. Let y be an inner point of 5, 
let T be a sphere with center y contained in S, and let s be an inner 
point of S N T; such an s exists; otherwise S N T would be contained 
in a hyperplane, and S N T = § N T = T would be contained in the 
same hyperplane. From (a), every point within a line segment joining 
S to a boundary point of S N T is an inner point of SN T. Since 
all points of the boundary of T are boundary points of S N T 
every point of T is within such a line segment, every point of T is an 
inner point of S N T, and therefore in S. Thus the inner points of S 
and S coincide. By exclusion, the boundary points of S and S coincide. 

Theorem 2.2.1. (i) If S and T are closed convex sets without com- 
mon points and at least one of S and T is bounded, there is a hyperplane 
u-x = c with u-x > cforxeS andu-x < c for xe T. 

(ii) If S and T are convex sets and if there exists a sequence of convex 
sets Sy such that (a) Sy and 7 have a separating hyperplane for each 
N and (b) every sphere about any point of S contains points in all but a 
finite number of S w, then S and T have a separating hyperplane. 

Proof. We first prove (i); say S is bounded. Figure 3 illustrates the 
construction used in the proof. 


Figure 3 
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Since T is closed and S is bounded and closed, there exist sequences 
{on}, {tn}, with o, eS and Ta £ T such that 


lon = ta] > inf Jjs-tl 
eS,teT 


and such that {Tn} is bounded, as can be seen from the inequality 
[lon] —| tl] <lon— tal 


and the fact that {on} is bounded. Hence there are points ø e S, t e T 
with 
jo—r|= = min Js-tl 
eS,te 


Then the hyperplane which passes through the midpoint of the line 
segment joining o and T and is perpendicular to this line segment, i.e., 


=x e-n: ($) =. 


has the required property. For, let q be any point with 


a) (e = 4) Se 

Then the square of the distance from t to any point on the line joining 
o and q is 

(2) IN = |ia + a-a r] 

and from (2) by expansion, we get 

(3) 


IA) = |q = o |? +2q-6)-@—r+lo—7/? 


ya eae of taking the derivative of (3) with respect to \ and setting 
= 0 is 


(4) F'O) = 2q — o): (© — 7) = 2q: (s — T) — 20: (o — 7) 
< 2e — 20- (0 — 7) 
Now, from the definition of ©; 


(5) o-(¢—7)—c¢ 


ll 


e-m (e-e) 


ilo- r|?>0 
From (4) and (5) we conclude that 


F0) <0 


Hence there are points w on the line segment joining o and q with 


ll 
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|w-r | < | o-—T b so that q ¢S, and, since q is any point satisfying 
(1), we conclude that, for all x eS, 
b (= Tt)x >c 
Similarly, (æ — t)-x < c forall xe T. 
To prove (ii), suppose the hyperplane 


UN'X = cy 


separates Sy from T, i.e., uy-x > ew for xe Sy, uy-x Sey for xe T, 
We may suppose, replacing uy, cy by uy/| uy l, cx/| uy |, that | uy | 
= 1 forall N. Let te T and let the points sy be a bounded sequence 
with sy eSy. Using Schwarz inequality, we then obtain 


=| t| <uy-t < ey < uy-sy < | sy | 


so that cy is bounded, and we may choose a subsequence of uy, cy with 
Uy — u, cy > cas N — œ through the chosen subsequence. More- 
over, |u| = 1, so that u že. Since uy-t < cy for all te T and all N, 
we have u-t < c for all te T. Fors eS, there is a sequence sy eSy 
with sy > sasN — œ. Since uy-sy > ey for all N, letting N > 0 
through the chosen subsequence yields u-s >c. This completes the 
Proof. Note that, in view of part (ii), we can, at the expense of replac- 
ing the strict inequalities of part (i) by weak inequalities, remove the 
requirement in part (i) that S be bounded, for, by taking Sy to be the 
intersection of S with the sphere Ky with center at the origin and radius 
N, the hypotheses of (ii) are satisfied by (i). 

Corollary 1. If sisa boundary point of a convex set T, there is a 
hyperplane u-x = c containing s such that u-x Se forxe 7. 

Proof. Choose a sequence sy — s with sy ¢ T, [See Problem 2.2.2 
and Lemma 2.2.1(b).] Then T and the sets Sy each consisting of the 
Single element sy satisfy the hypotheses of part (ii) of the theorem, so 
that s and 7 have a separating hyperplane u-x = c. Since u-s > cand 
US < c, we have u-s = c. 

The plane u-x = c of the above corollary is called a supporting hyper- 
plane to the convex set T at the point s. 

Corollary 2. IfS and T are convex sets without common points, they 
have a separating hyperplane. 

Proof. Let {sy}, {ty} be sequences of points with Sy eS, tye T 
Such that every sphere about any point of S, T contains at least one 
Sv, ty respectively. If Sy, Tar are the convex hulls of (s;, “++, Sy), 

ti, +++, tap) respectively, Sy, Tar are closed bounded convex sets with- 
Cut common points and have a separating hyperplane from part (i) of 
the theorem. Then for fixed M the sets S, Ta satisfy the hypotheses of 
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part (ii) of the theorem and consequently have a separating hyperplane. 
Then S and T also satisfy the hypotheses of (ii), using the sequence Tw, 
so that they have a separating hyperplane. 


Theorem 2.2.2. For any subset R of n space, every point r* in the 
convex hull R* of R is representable in the form 


n n 
"= J Nn 420, Ow =1, pef 
0 0 


If r* is a boundary point of R*, \9 can be chosen to be zero. 

Proof. Suppose first that r* is a boundary point of R*, and let u-x = ¢ 

be a hyperplane through r* with u-x < c for all xe R*. Say r* 
k 


L Ata As > 0. Then u-r; = c for i = l, ---, k. We show that, if 
1 
k 
k > N, we can choose constants a, «++, ap not all zero such that Ya; 
1 


k 
= 0, ar; =e. We use the fact that any set of more than n vectors 
T 


in n space is linearly dependent. If c = 0, choose a; not all zero so that 
k 
> ar; =e. Then 
j k 
Dila(u-r;) = e-u = 0 


i=| 


and hence 


If ¢ = 0, choose a; not all zero so that D a,(r; + u) =e. Then 
1 
k 
Da(utr)-u=eu=0 


i=1 


and hence 
k k 
lup Z a= 05 Ya; 
1 1 


k 
and since J a;(r; + u) = e, 


i=l 


k 
b3 Qf; =e 
2 
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Consequently, for any real number é 


k 
SA; + ta) = 1 
$ 


and 
k 
Xo (A; + tar; = s 
1 
Choosing 
ss 
t= — min— 
ai>0 Qi 


makes ; + ta; zero for some 7 and non-negative for all i, so that, if 
k > n, a representation of r* with less than k points of r is possible. 

For non-boundary points of R*, consider the sets Q, Q* of all points 
in (n + 1)-space of the form (r, 0), (r*, 0), where r e R, r* e R*. Then 
Q* is the convex hull of Q and all points of Q* are boundary points. 
Thus, for any r* e R*, there is a representation 


n 
(r*, 0) = D A:r 0) 
0 
where 
n 
reR, N20 BN=l1 
0 


so that a 
r= Ba Adi 
0 


This completes the proof. 
A point s in a convex set S is called an extreme point of S if it is not 


interior to any line segment of S, i.e., if there do not exist two points 
sı €S, s2 e S with sı # Sp and s = As) + (1 —A)sx,0 <A < 1. 

Theorem 2.2.3. A closed bounded convex set has extreme points in 
every supporting hyperplane. 

Proof. Say u-x = cisa supporting hyperplane of the closed, bounded 
convex set S, and let T be the intersection of S with u-x = c, so that T 
is a closed, bounded non-empty convex subset of S. Any extreme point 
of T is also an extreme point of S, since t = As; + (1 — d)sp, te T, 
s, eS, s.eS,0<A<1 imply sı £ T, sze T. Thus it suffices to show 
that T has extreme points. Now, for any closed bounded convex set T, 
the point t* e T farthest from the origin is an extreme point, since, if 
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t* = oti + (1 — do) te, tı Æ te, O < Ao < 1, the quadratic function 
IA) = | At + 1 Atal? = 2] 4. — tel? + At — te) te] + | te |? 


has a positive coefficient for \?, so that at least one of f(0) = | tə |? and 
SQ) = | tı |? must exceed f(\o) = | t* |2. This completes the proof. 


Theorem 2.2.4, If a closed bounded convex set has only a finite num- 
ber of extreme points, it is the convex hull of its extreme points. 


Proof. Say x1, +++, x, are the extreme points of the closed bounded 
convex set S, and let S* be the convex hull of X1, +*+, Xp. If there is a 
point soeS, so¢S*, there is a hyperplane u-x = with U-So >c, 
u-s* < o fors*eS*. Ife, = max (u-x), u-x =c isa supporting hy- 

xe 
perplane of S not containing any of the extreme points xı, 
therefore contradicting Theorem 2.2.3. 

If S is a convex set of Sn, a numerical function f defined on S is said 

to be convex if for all x1, x. € S and all 4,0 <A < 1 we have 


fxi + (1 = A)xe] < fOr) + (1 NSE) 


and it is said to be concave if 


FPx1 + (1 — d)xe] > FOr) + (1 AS) 


If the above inequalities are strict for X1 ¥ Xə, 0 < A <1, f is called 
strictly convex or strictly concave, respectively. Geometrically, a function 
is convex if every line segment joining two points on its graph lies en- 
tirely on or above the graph, and it is concave if the line segment lies 
entirely on or below the graph. By induction on k we obtain that, for 
any k points %1, +++, X, in S and any non-negative numbers \y, +++, dz 


with bD ài = 1, we have 
T 


‘++, X and 


(È na) < Ease) 


if f is convex, and 


f (= ix) 2 Eada 


if fis concave. A function f is convex (concave) if and only if the func- 
tion ¢ with 


A, xı, xo) = ft + (1 — A)xo] 
is a convex (concave) function of à on 0 SX < 1 for all x, x2 £ S [see 
Problem 2.2.6]. Thus the verification that a certain function of n vari- 
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ables is convex (concave) reduces to verifying that certain functions of 
one variable are convex (concave). 


Theorem 2.2.5. Let a function ¢ be defined and possess the first 
two derivatives throughout the closed interval (a, b). Then 9 is convex 
on (a, b) if g’” > 0 on (a, b). 

Proof. Since for any two points c and d in (a, b) with c < d, the mean- 
value theorem states that 


(1) old) — glc) = o'd- c), e<g<d 

we have, fora < yı < yo S band 0 <e <1, 

(2) elayı + (1 — a)y] — lu) = (1 — a) (y2 — y)g' (tı) 
where yı < {1 < ayı + (1 — a)y2, and 

(3) oly2) — layı + (1 — a)y:] = aly — y1)e’ (t2) 


where ayı + (1 — a)y2 < t2 < y2- Solving (2) and (3) for g'a) and 
¢'(f2) and noting that ¢’(¢1) < ¢'(¢2), since ”(y) > 0, we obtain 


(4) glay: + (1 — @)y2] < aply:) + (1 — @)e(y2) 


which proves the theorem. 
Examples of convex functions are: 
n > aiti 
Lamy |z|% a21; and e 
t=1 
The following relation exists between convex functions and convex 
sets [see Problem 2.2.7]: For any convex function f defined on a closed 
convex subset S of Sn, the set T of all points (x, y) € Sn+ı with x e5, 


y > f(x) is convex; if in addition f is continuous, T is closed. 

Theorem 2.2.6. If f is a continuous convex function defined on a 
closed convex set S of Sn, for any € > 0 and any xo £ S there is a linear 
function I such that U(x) < f(x) for xeS and U(x) > fo) — e If x, 
is an interior point of S, there is a linear function 1, with L(x) < f(x) 
for xe S and l (x0) = f(Xo)- 

Proof. Since the set T of points (x, y) with xeS,y>f(x)isa closed 
convex subset of Sn41 not containing the point (Xo, f(xo) — «), there ES 
strictly separating hyperplane, i.e., a u £ Sn and numbers a, c with 


ux + af(x) >c for xes 
u-xo + alf(o) — d <¢ 
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Then a > 0 and the linear function l with 


c— ux 
Ux) = ——— 
has the required property. 
Let Xo be an interior point of S. The set T has a supporting hyper- 
plane at (Xo, f(Xo)), i.e., there is a u; € S, and a number a with 


(1) uWy:X + ay > 1-2 + af(Xo) 


for allxeS and y > f(x). We shall show that a > 0 in (1). We con- 
sider two cases; u; = e and u; ~e. If u, =e then (1) becomes ay 
2 af(xo) and, by definition of a hyperplane, a #0. Since y can be 
taken arbitrarily large, it follows that a > 0. If u; = e, then for a 


sufficiently small e > 0, x’ = xo — eu; is an interior point of S. More- 
over, 


(2) uix’ < Uy -X 


so that substituting x’ for x in (1) and employing (2) yields ay > af (Xo) 
for all y. Thus in both cases a > 0, and consequently the linear func- 
tion l, defined by 
ui: (Xo — x) 

(3) h(x) = Pae + f(Xo) 
has the required property. 

It follows from Theorem 2.2.6 that a continuous convex function fon 
a closed set S is the upper bound of all linear functions 1 which never 
exceed it. If f is not continuous, this need not occur, as shown by 
f(x) = 0, z > 0, f(0) = 1, where S = {x:4 > 0}. A partial converse 
to Theorem 2.2.6, which will be used later, is 

Theorem 2.2.7. 


If A is a bounded subset of n space, the function ¢ 
defined by ¢(x) 


= sup (a-x) is continuous and convex on n space. 
aed 


Proof. For every e > 0 there is a finite set Ay = 
which is «dense in A 


a;—al <e Now | 
(1) 
so that, writing f(x) 


fai, +++, aw} in A 
, ie., for every ac A there is an a;e Ay with 
a; — a | < e implies 

[ax —ax|<¢qx| 
= sup (a;-x), we have f(x) > a-x — el x| for all 
2 ox) — d x|. But o(x) > f(x), Hence 
(x) > f(x) > ox) — d x| 


for all x. Choosing « = 1/M yields a sequence fy which converges uni- 


a £ A and hence f(x) 
(2) 
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formly to y on every bounded set. Since each far is convex [see Prob- 
lem 2.2.8] and continuous, ¢ is convex [see Problem 2.2.9] and continuous 


(since the convergence to ¢ is uniform). 
Theorem 2.2.8. Let f be any continuous convex function defined on 
Sn, let {xm} be a sequence of points in Sn, and let {Am} be a sequence of 


non-negative numbers with D> Am = 1. If DO \ml Xm| < œ, then 
1 1 


(2 Wa) E mfn) 


with actual inequality if f is strictly convex, unless (x; — x,)A;Aj = 0 for 
all i, j. If f(x) ~ +o as |x| — © and 2nf(%m) converges, then 


=Am| Xm | converges. 
Proof. Say Xo = >» Xm. According to Theorem 2.2.6 there is a 
1 
linear function I such that U(x) = u-x + ¢ with U(x) < f(x) for all x and 
U(xo) = f(xo). Moreover, if f is strictly convex, (x) < f(x) for x # Xo, 
since I(x,) = f(x1) will yield 


Xo + X1 Eo) +E) _ U(X) + U1) ant Xo + X1 
s( 2 ) < 2 2 ( 2 ) 


unless x; = Xo. Then 
> Ninf (Km) Z Do rml(Xm) = U%o) = F%o) 
i 1 


with, for strictly convex f, actual inequality unless Xm = Xo whenever 


An > 0. i i 
"The last part of the theorem will follow if we show that there exist 


constants a > 0, b with f(x) > al x| + b for all x. We may suppose, 
adding a constant to f(x) if necessary, that f(x) 2 1 for all x. Unless 
there is an a > 0 with f(x) > a| x | for all x, there is a sequence {xm} 


with 

69) SEn) < | z'n |/m 

it follows from (1) that |z'm| > mf(X'm) > m. 
hypothesis f(x'm) — %, 80 that, by (1) again, 


= CX! my Where Cm = M/| X'm|; then cm — 0, 
| =m. Then, for m large enough, 


Since f(x’) 2 1 for all m, 
Thus | 5 cen | — œ, and by 
| Xia |/m — œ. Write Ym . 
and | ym | —> oasm — since | Ym 
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0 < cm < 1, and, since f is convex, 


fm) < (L — Cm)f(O) + Cmf(Z’m) < f0) + 1 


This contradicts the hypothesis that f(ym) > +% as |ym|— © and 
completes the proof. 


PROBLEMS 


2.2.1. The intersection of any number of convex sets is convex. 
2.2.2. The closure of a convex set is convex. 


2.2.3. (a) A convex set always has (relative) inner points. (b) The set of inner 
points of a convex set is convex. 


2.2.4. For any set R, the convex hull of the closure of R is contained in the closure 
of the convex hull of R, with equality if R is bounded. 

2.2.5. Give an example to show that the following is not in general true: If S$ 
and T are closed convex sets without common points there is a hyperplane u-x = ¢ 
with u-x > c for xeS and u-x < c for xe T. 

2.2.6. A function f defined on a convex subset S of Sn is convex if and only if the 


function ¢ with ¢(A, x1, x2) = f[Ax1 + (1 — )x2] is a convex function of à on 0 < 
` <1 for all x1, x2 eS. 


2.2.7. For any convex function f defined on a closed convex subset S of Sn, the 


set T of all points (x, y) € S,41 with x €S, y > f(x) is convex; if in addition f is con- 
tinuous, T is closed. 


2.2.8. If f: is for each ¢ convex in x on a convex set S and if f(x) = sup f(x) is 
t 
finite for each x e S, then f is convex on S. 


2.2.9. If fi, fo, «++, is a sequence of convex functions all defined on a convex set 
S, which converges pointwise to a function f, then f is convex on S. 


2.2.10. Give an example to show that in the last sentence of Theorem 2.2.6 the 
hypothesis that xo is an interior point is necessary. 


2.3. Games with a Value 
The basic result of the theory of finite games is 


Theorem 2.3.1. Every finite game has a value, and each player has 
at least one good (mixed) strategy. 


Proof. Let G be any finite game, and let A = | | a; | l i=1, +++, m; 

j= 1,---+,nbe the matrix of G. We must show that the mixed extension 

T = (Œ, H, M) of G has a pure value and each player has a good pure 

strategy in T, where =, H consist of all points Ẹ = (E(1), «++, &(m)), 
m 


n= (01), +++) 1@)) in Sm, Sn with li) > 0, E EO = 1, a(j) > 0, 
n 1 


x n(j) = 1 respectively, and 


M, n) = D atnl) 


ij 
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Let S* be the convex hull of the n points c1, ---, Cn of Sm whose coordi- 
nates are the columns of A, i.e., Cj = (Qij, ***, amj). Then 


o MG, n = DH (was) = 8s 
i j 
where s = >> n(j)c;eS*, and, since every s e S* has a representation 
1 


n 
s = > n(j)c;, ne H,T is equivalent to the game T; = (Z, S*, My) with 
1 


My (È, s) = Ẹ-s. To show that T, has a pure value, it is sufficient, from 
Theorem 1.6.1, to show that 


(2) v*p, < A*r 

We have 

(3) w= nar (Ẹ-s) = max (S1, ++, Sm) 
and 

(4) vr, = min Tr (s) = Tr (6) 


where sọ e S*. To establish (2), it is sufficient to find a o £ = such that, 
for all s e S*, 
(5) v*r < Šos 


since it would follow from this fact that 
Fi < inf Eos = Ar, (Ëo) < Mr 


Let T consist of all points t = (4, *** tm) E Sm ay < vp, for i = 
1, --- m. Then T is a convex set not intersecting S*, since it follows 
, 3 M: 


from (3) and (4) that any point S € S* has at least one coordinate >v*r,. 
Hence, by corollary 2 of Theorem 2.2.1 there is a hyperplane a-x =c 
with a x > c for xe S* and a-x Se for xe T. Let 6; be the point in 
Sm whose ith coordinate is 1 and whose remaining coordinates are zero. 
Then the point so — ô; € T, and 
a-So = ¢ È a (So — 8:) 

so th 
= a6; 20 


and thus s 
a;>0 for i=1,: m 


Let fo = a E aando=0/ Dea Then £Z and 


i=l = 
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&-s>v for ses 
(6) tot <v for tef 
In particular, the point t* = (u*p,, +++, v*r,) € T, so that 
(7) Eo-t* = v*p, Sv 
and consequently we conclude from (6) and (7) that, for all se S, 
o's > v > v*y, 


which establishes (5). Thus v is the value of G and & and so are good 
(mixed) strategies for I and II respectively, which completes the proof. 
Notice that the good strategy So for player II is a boundary point of 


S*, since So — e(l, +++, 1) ¢S* for any e > 0. According to Theorem 
2.2.2, So can be represented as a convex linear combination of m of the 
points (cy, +--+, Cn). A similar argument applies to player I. Thus we 
have 


Theorem 2.3.2. If G = (X, Y, M) is a finite game with matrix 
|| az; |], = 1, =+, m, j=1, +++, n, then each player has a good 
strategy which is a mixture of at most min(m, n) pure strategies. 

Various extensions of Theorem 2.3.1 to infinite games have been 
given. A simple criterion for a game to have a value is 


Theorem 2.3.3. A game G = (X, Y, M) has a value if the following 
condition is satisfied: For every e > 0 there exists a finite set 21, +++, Em 
of elements of X such that, for every x in X, there exists a mixture £ of 
21, ***, Zm, such that, for all y in Y, 


M(E, y) > M(x, y) — e 


Proof. In view of Definition 1.8.4 and Theorem 1.6.1, it is sufficient 
to show that 


v*p = Ařp 
which will be proved if we can show that, for every e > 0, 
(1) u*p — A*r < 2e 


Let e be any positive number, and let 2, +- *, Xm be elements of X such 


that, for every element z of X, there exists a mixture E of x, ++ 
such that 


+, Im 


(2) M(é, y) > M(z, y) — e for all yin Y 


The set S of all vectors v(y) = (M (2, y), > 


++, M (Em, y)) for ye Y isa 
bounded subset of S, 


m, Since M is a bounded function. Hence S contains 
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a finite subset v(y1), --+, V(Yn) such that, for every y there is a 7 such 
that 


(3) I vw) — vo) | <e 

Since for any vector Z = (21, 22, ***, Zm) | z| < G2)" = | z | for all z, 
inequality (3) implies that for every y there is aj such that 

(4) | Men y) — Mtn y) | <e for i=l, -m 


Now we set Xo = (tı, ***, tm) and Yo = (Y1, +++, Yn). Then the game 
Go = (Xo, Yo, M) is finite and by Theorem 2.3.1 has a value v and good 
strategies E* = (E"(1), =+, Em) and m* = (1*(01), ++, 9*(n)) for the 
two players. Thus, for all mixtures § and n of Xo and Yo respectively, 
we have 


(5) M(E, n*) < vo < ME*, n) 
From (2), we obtain, for j = 1, +++, n, 
(6) M(x, y) S MEP, y) + € 


and hence, multiplying (6) by n*(j) and summing, we obtain 


(7) Mx, n*) = SMG, y*0) < 2 ME, yn* G) + € 


j=1 


MEP, n*) +e 


ll 


From (5) and (7) we have for allaeX 

(8) Mz, n“) < MEM, n) +e Smt 
and thus taking the supremum of (8) with respect to x yields 
(9) v*p < Tr(n*) Svo + € 


Also, from (4) we infer that for every Y there is a j such that 


M(x y) = M@o yi) — € for i=l, e,m 
and th 
i M(t, y) = MES yi) — € 


d rom (5 
and hence from (5) mies wi 


and thus 
(10) 
From (9) and (10 


A*p > Ar(E*) 2% — e 


) we infer (1) which completes the proof. 
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Corollary 1. Any game G = (X, Y, M) where X is finite has a value, 
and player I has a good strategy. 


Proof. Since the condition of the theorem is satisfied, G has a value v. 
Let X = (x, --+, £m), and let T = (Z, H, M) be the mixed extension 
of G. Then 

Mp Svs ee Ar(&) 
ez 


es p 
and there exists a sequence Ẹ®, &, ---, in Z such that Ar(Ẹ®) > A*p 
as k — œ, where 


E® = (£1), (2), ---, E(m)) 


Since all the vectors lie in the m-dimensional cube with side 1 (more 

exactly, in the fundamental probability simplex in m space), there is 

a convergent subsequence which approaches a vector &* = (£*(1), --- 
m 


£*(m)), with £*(0) > 0, > &*(@ = 1. Since the function M is continu- 
i=l 


ous and linear in the components of Ẹ®, M(E™, y) > M(E*, y) for all 
yask — œ. Also, since 


M(E™, y) > Ap(E™) for all y 
we obtain 


M(E*, y) = lim ME, y) > lim Ap&™) =A", for all y 
ko o ko o 
and §* is a good strategy for player I. 


Corollary 2. Any game G = (X, Y, M) where X , Y are closed bounded 
subsets of Sm, Sn and M is continuous has a value. 


Proof. Since M(x, y) is defined for x in Sp and y in Sn, we can con- 
sider M as a function of one variable ze Z = X X Y in Smẹn. Since M 
is continuous on Z, it is uniformly continuous: For every e > 0 there is 
a 6>0 such that | M(z1) — M (z3) | < whenever | zı — za | <6. 
Let x1, -+ +, x, be a finite subset of X such that for every x e X there is 
an i with | x; — x| <6. Then | (x;, y) — (x, y) | = |x; — x| <6 for 
all y, and | m (x, y) — M (z; y) | < for all y. Thus Theorem 2.3.3 
applies. 

If a game has a value, then either player can, by a suitable choice of 
strategies, approximate this value to within e for any e > 0 no matter 
how small. Such strategies are called e- 


good strategies for each player 
and are defined as follows: 


Definition 2.3.1. Let T = =, H, M) be the mixed extension of the 
game G = (X, Y, M), let e > 0 be given, let T have value vp, and let E 
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be an element of = and 7 be an element of H such that 

Ar) = wr — € 

Tr(n) S vr + € 


Then £ and 7 are called egood (mixed) strategies for players I and II 
respectively. 
PROBLEMS 


2.3.1. In a finite game the set of good strategies for either player is bounded, 
closed, and convex. P 
2.3.2. Carry out the steps of the proof of Theorem 2.3.1 for the finite game with 
matrix ee 
45 8 
a= ee 8 5 6 6 


and find the value and good strategies for both players. 


2.4. S Games 


er I has a finite number of pure strategies 
ful geometrical interpretation, even when 
layer II is infinite. Let G = (X, Y, M) 
be a game in which X = (ti, ***, tm). We associate with each ye Y 
the point s(y) in m space with coordinates M (t1, Y), +++, M @m, y), and 
let S be the set of all points s(y) as Y varies over Y. Then G is equivalent 
to a game which is played in the following manner: Player II selects a 
point s = (S1, S2; ***, Sm) from the set S. Independently of his choice, 
player I selects a coordinate. The payoff to player I is the value of this 
coordinate. Such games will be called S games and are formally defined 


as follows: 


Definition 2.4.1. Let Im 
a bounded subset of m space, 


The games in which play 
have an interesting and use 
the number of pure strategies for p! 


consist of the integers 7 = 1, +++, m, let S be 
and, for î € Im, S eS, let 


M(i,s) = si 
where s; is the ith coordinate of s. Then the game G; = (Im, S, M) is 
called an S game. 

> i iginal game G, as well as in 
A mixed strategy for player I in the original G, as 
the S game, is a vector È = EU), E2), +++, §(n)) with £(@) > 0 and 


I ü) =1. A mixed strategy for pl 


isl robability distribution 7 over Y, which in the S game is 


choice of a vr ; 
equivalent fas selecting a point in Sm which is attainable as a countable 


ayer II in the original game is a 
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convex linear combination of points in S. Thus the mixed extension of 
an S game is equivalent to the game T, = (=, R, M) where Z consists of 
all the vectors & defined above, R is the set of all points r of m space 
which are countable convex linear combinations of points of S and 


ME, r) = §-r 


It is clear from the definition of the convex hull of a set that R con- 
tains the convex hull S* of S. We shall now prove that R is in fact 
identical with S*. 


Theorem 2.4.1. Let S be any set in Sm, S* the convex hull of S. Then 
any convergent convex mixture of a countable number of points of S 
belongs to S*. 


Proof. We first show that any such point s, given by s = D ASi, where 
1 


without loss of generality we take à; > 0, and 2), = 1, s; eS for all å, 
belongs to the closure S* of S*. We write 


s= lim >)A,s; = lim (= x) (= ts:) 
i i=l i=l 


n= o iml n= o 


n 


n n 
where ¢; = TE 0,5 = 1. Since DA —> lasn > 0, we 
i=l i. 


i=1 1 
have 


s= lim tis; = lim z, 


n> e; na 
where Zz, £ S*. Hence s e 5*, 

N ow let Qr, of dimension k < m, be the lowest dimensional hyperplane 
containing the points S1, S2, +- +, let P* be the convex hull of s1, S2, +- 
and let N be an integer so large that the convex hull P*y of BiSa iy 
Sy is a k-dimensional convex set (simplex) in Qr, i.e., has inner points 


in Q. By definition of Q}, such an N clearly exists. Also P* C S*, and 
P*y C P* C Qr. As above, we write 


ee) 


o N ~*~ 
s= bF AS; = S AS; + SS AS; 
1 1 


N+ 


(EE) «(a) (S 


N +1 WV 4-1 


ll 


uty + (1 — p)te 
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N N 2 
where 0< 2 = OM <1, D&=1, DH w=], ta vi > 0, so that 
1 1 N+1 
N 
tı = J, ts; is an inner point of P*y, consequently an inner point of 
1 


œ 
P*, and t = y;Si belongs to P* by the first part of the proof. 
P 


N41 

Hence, by meal 2.2.1, se P* C S*, and the theorem is proved. 
Corollary 1 of Theorem 2.3.3 asserts that every S game has a value v 
and that I has a good strategy & so that §-s > v for all s e S*. Player 
II will generally not have a good strategy, but there will exist a sequence 
Sn = (Sni, ***, Sam) € S* with max (sni) > vasn — œ. If S* is closed, 


i 

a subsequence of s, converges to an g* = etn =, 6%) eS*, and 

max (s*,;) = v, so that s* is a good strategy for II. The main facts are 
i 


summarized in 

Theorem 2.4.2. Every S game has a value and player I has a good 
strategy. Moreover, if (Im, S, M ) is an S game and if S is closed (or 
more generally, if the convex hull S* of S is closed), then player II has 
a good strategy which is a mixture of at most m pure strategies. If S 
is closed and convex, player II has a good pure strategy. 


Proof. If II has a good strategy Ss, it isa boundary point of S*, and 
thus, by Theorem 2.2.2, s is a convex linear combination of at most m 
points of S. If S is closed and convex, S* = S, so that II has a good 
strategy in S. The remaining facts have already been proved. 

We observe that the proof of the fundamental theorem of the theory 
of games [Theorem 2.3.1] was accomplished by first converting it into 
an S game, with S consisting of the points ¢1, +++, Cn in Sm. The finite- 
ness of the set S was not essential to the proof. What was required 
there was the closure of S*, to insure that the minimum of Y(s), for 
s e S*, would be assumed by a point So ¢S*. Thus the same proof can 
be employed to give the more general result contained in Theorem 2.4.2, 
It could also be used to establish the truth of corollary 1 to Theo- 


rem 2.3.3. 
The proof of the f ; 
geometric method for locatin 


‘undamental theorem also suggests the following 
g the value and good strategies in an S 
game. For any real number 4, let Ta = {t= (4, ta, > tm) €Sm: ti < a, 
i= 1, 2, ++, mh, i.e Ta is the negative orthant” in m space shifted 
so that the vertex (0, 0, -+ +, 0) is at the point (a, a, +++, a). Tais obvi- 
ously convex. Let at = sup {a: Ta N S* = @}. (In the proof of 


Theorem 2.3.1, a* is in fact equal to v*p,, the value of the game.) T,» S* 
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= Ø; hence by corollary 2 to Theorem 2.2.1 there exists a hyperplane 
&)-x = c separating Ta» and S*, where without loss of generality we can 
assume &) £=. Then §o will be a good strategy for player I, and a* =e 
is the value of the game. Further, T,+ N S* = Ø. Lets* be any point 
in Tar N S*. If S* is closed, then s* e S* and is a good strategy for 
player II, which is a mixture of at most m pure strategies in S. If S* is 
not closed, then there exists a sequence of points s*,, s*9, --- in S* 
such that s*, — s*, s*, a mixture of at most m + 1 pure strategies 
in S, and such that max (s*,;) — a*. 


The above discussion also shows that the assumption of boundedness 
for the payoff function M of the original game G is not required. What 
is required is the weaker condition that M be bounded from below, i.e., 
there exists a positive constant K such that M(x, y) > —K for all 
x and y. 

The notion of S games and their mixed extensions are capable of a 
further useful generalization. Consider the game G = (S, T, M) where 
S and T are bounded convex subsets in m space; player I chooses a 
point s in S, player II independently chooses a point t in T, and the 
payoff M (s, t) to player I is s-t units. From Theorem 2.4.1 and the 
convexity of S and T, it follows that any mixing of strategies on the 
part of the players by means of a chance device can add nothing new to 
the game—that G is identical with its mixed extension. If, in addition, 
S and T are assumed to be closed, it follows from corollary 2 to Theo- 
rem 2.3.3, since M is continuous, that the game G has a value v, hence 
a pure value. By Problem 1.6.3 it follows at once that there exist 
points s* in S, t* in T (i.e., good strategies for the two players), such that 


min (s*-t) = max (s-t*) = y = s*-t* 
ter ses 


As we shall have occasion to use these res 
book, we summarize them in 


Theorem 2.4.3. Let G = (S, T, M) be a game, S and T bounded, 
closed, convex subsets of S 


m, M(s, t) = s-tforseS,teT. Then G has 
a value, and each player has a good pure strategy. Equivalently, for 


any closed bounded convex sets S, T of Sm, there are points s* eS, 
t*e T with 


ults in later portions of the 


s-t* < s*.t* < s*.t 
for all se 5, te T. 
PROBLEMS 


2.4.1. Find the value and good strategies for the following S games in 2-space, 
with points x = (21, 2»): 
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(a) S = fx: (z1 — 8)? + (z2 — 3)? < 9} 

(b) S = {x: (zı — 10)? + (z2 — 10)? < 50} 
1 

(c) S= fke > 0a >i} 


2.4.2. Find the value and good strategies for the game G = (S*, T*, M), where 
S* is the convex hull in 2-space of the three points (2, 8), (4, 3), (7, 2) and T* is the 
convex hull of two points (1, 0) and (0, 1). 


2.5. Games with Convex Payoff 


We have seen that any game in which X and Y are closed and bounded 
subsets of Sm, Sn and in which the payoff M is continuous has a value. 
A further restriction on the payoff, convexity in y for fixed x, enables us 
to obtain a more precise result—that player II has a good pure strategy 
and that player I has a good strategy which is a mixture of at most 
n + 1 pure strategies. This fact is the main result of the present sec- 
tion and is incorporated in the following theorem and corollaries. 


Theorem 2.5.1. If G = (X, Y, M) is a game such that Y is a closed 
bounded convex subset of S, and M(x, y) is for each x a continuous 
convex function of y, then G has a value, player II has a good pure 
strategy, and, for every «e > 0, player I has an e-good strategy which is 
a mixture of at most n + 1 pure strategies. 


Proof. Since 
(1) T(y) = sup M(x, y); v* = inf T(y) 
z y 


we can select a sequence {Ym} such that lim T(Ym) = v*. Since Y is 


m= o 


closed and bounded, we can choose the sequence {Ym} so that lim Ym 


m= 
= y' and y’c Y. Also, for each z, M(x, Ym) < T(ym), and hence by 
Problem 1.6.4, 


(2) M(z, y’) < T(y’) = v* 


Consider the collection £ of linear functions 1, defined on Y, for 
which there is an v e X with L(y) < M(e, y) for all y e Y; and, for J in 
£ and any ¢ > 0, let H; be the set of all y such that L(y) > v* — e. 
Since J is continuous, H; is open in Y. We shall show that the sets H, 
cover Y; i.e., every ye Y belongs to some Hı. Select yoe F. Then, 
there exists an % in X such that M (zo, yo) > v* — €/2, since 


3) sup M(x, yo) = T(yo) > v* 
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Now by Theorem 2.2.6 there exists a linear function lo such that lo(y) < 
M (xo, y) for all y and lo(yo) > M (xo, Yo) — ¢/2. But, since M (zo, Yo) > 
v* — e/2, lo(yo) > v* — e, and hence, yo e Hu. Thus the open sets H, 
cover Y so that, by the Heine-Borel theorem, we can find a finite sub- 
collection of these sets which cover Y. There is then a finite set of 
linear functions lı, ---, ly each in the collection £, with max l;(y) > 


* 


v* — e for each ye Y. Since J; is in £, there is an tie X with L(y) < 

M (z, y) for all y. l 
Let E = (&(1), (2), -+-, &(n)) be a mixed strategy for player I in 

which he selects z; (i = 1, 2, ---, N) with probability (i). Then 


N N 
(4) ME, y) = D0 EOM (z; y) > D OLY) 
i=l i=l 


for ally e Y. We need now to show that we can find a €* with at most 
N 


n + 1 of its components different from zero such that >> *(i)li(y) > 
i=1 


v* — e Suppose 


(5) L(y) = 2 iji + Ging 
= 


Let U be the convex hull in Sri of the set of points u; = (ait, ajo, «+ i 
@in41), and let V be the set of all points of the form (y, 1), where yeY. 
Then V is closed and convex. We apply Theorem 2.4.3 to the sets U els 
obtaining points u* e U, v* = (y*, 1) e V such that 
(6) u-v* < u*.y* < u*.y 

forall ue U, veV. There is an i for which U,(y*) > y* — €; i.e., u-v* 
> v* — e so that u*.y* > y* — eand u*-v > o* — ¢ for all veV. 
Moreover, u* is a boundary point of U; otherwise there would be a 
ô aa 0 such that the point us obtained by adding 6 to the (n + 1)st co- 
ordinate of u* would be in U, and uj-v* = u*.y* + ô, contradicting 
the inequality U v* < u*.y*, Theorem 2.2.2 then implies that u* 

N 
= x Ou with #0) >0, D pa = 
= i=] 


1, and at most (n + 1) of 
the &*(2)’s are different from zero, 


Then, for all yeY, 


N 
(7) LEOLI) = uty, 1) > ye — 
From (2), (4), 
therefore, that 


§* is an €good 


and (7) we see that v* 
y’ is a good strategy 
strategy for player I. 


is the value of the game; from (2), 
for player II; and, from (7), that 
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Corollary 1. If in addition X is a closed bounded subset of Sm and M 
is for each y continuous in x, then I has a good strategy which is a mixture 
of at most n + 1 pure strategies. 

Proof. Let &, mixing x™,, ---, x® ny1 in proportion €(1), ---, 
E™ (n + 1) be a mixed strategy for player I such that M(Ẹ®, y) > v* 
— 1/k for all y. Choose a subsequence of k for which x, —> x*,, 
EG) — *(i) for i = 1, 2, +++, n +1, as k — œ through this subse- 
quence. Then 


n+1 
ME, y) = DEO OME™,, y) > ME, y) 
isl 
as k — œ through the subsequence, where §* mixes x*), +++, x*n4, in 


the proportions £*(1), +++, *(n + 1), so that M(§*, y) > v* for all y 
and Ẹ* is a good strategy for player I. 

Corollary 2. If G = (X, Y, M) is a game in which X, Y are closed 
bounded convex subsets of Sn and M(x, y) is for each y a continuous 
concave function in x and for each x a continuous convex function in 


y, then G has a value and each player has a good pure strategy. 
Corollary 3. If G = (X, Y, M) is a game in which X, Y are closed 
bounded subsets of S, and for each x = (tı, +++, %)eX and y = 


++, yn) © Y, M(x, y) = È sys, then G has a value and each 
i=1 


i= 


(v1, 
player has a good strategy which is a mixture of at most n + 1 pure 
strategies. 


Proof. Let T be the mixed extension of G. Then T is equivalent to 


the game T}, = (X*, Y*, M) where = is the convex hull of X, Y* is the 


convex hull of Y, and M(x, y) = X xyi, xeX*,yeY*. Hence, by 


corollary 2 above, T; has a value and each player has a good pure strat- 
egy inT,. By Theorem 2.2.2, each pure strategy of D can be repre- 
sented as a mixture of at most n + 1 pure strategies in G. This com- 
pletes the proof. 

Note that this corollary also follows from Theorem 2.4.3. 


PROBLEMS 


2.6.1. If in corollary 1 to Theorem 2.5.1, M(x, y) is for each x a continuous and 


2 I ' tegy is unique. 
strict] function of y, then player II's good pure au 
me prer D Y. 1. be a game in which X, Y are closed bounded subsets of 
5.2. = (X71, 


Sn and M(x, y) = Dd ayriy;. Then G has a value, and each player has a good 
J 


t; . 
strategy which is a mixture of at most 7 + 1 pure strategies. 
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2.5.3. If, in Theorem 2.5.1, X is finite, player I has a good strategy which is a 
mixture of at most n + 1 pure strategies. , 

2.5.4. Find the value and good strategies for the following games G = (X, Y, M) 
where X = (0 < z < 1) and Y = (0 < y < 1) and 


(a) M(z, y) = (y — 2)? 
©) Mlz, y) = -s+ y-a? 
(O) M(z, y) = y — 2ry +1 


2.6. Extended Mixed Strategies 


Any game G = (X, Y, M) where X, Y are closed bounded subsets of 
Sm, Sn and M is continuous on X X Y has a value according to corol- 
lary 2 of Theorem 2.3.3. Thus this corollary guarantees e-good strat- 
egies to the two players. However, if the concept of mixed strategies is 
suitably extended, good strategies for both players can be shown to 
exist for all games of this type. That this does not hold with our re- 
striction to discrete distributions is illustrated in the following example. 

Example. X = (0 < z < 1), Y = (0 < y < 1), M(x, y) = fæ — y), 
where f(t) = t(1 — t) for0 < t < 1 and f4) = f(t + 1) for =1 < £3 0. 
Notice that, for every fixed yo, 


1 si 1 
fae, Yo) dz -f fe+1—99 de + | fæ- w) dx 


= Í E flu) du + J Yu) du = { dua 2 


1 
Thus, for every mixed Strategy n, i M(x, n) dx = 1/6. Now 
0 
M(z, n) = D Ofe — yi) cannot be identically 1/6, since at z = Yi 
the derivative of ni) f(z — y:i) with respect to x fails to exist, while the 
derivatives of the other terms do exist at this point, so that ee 
te 
1 
f M(x, n) dx = 1/6 and M(x, n) = 1/6, 
there is an % with M(z, n) > 1/6, i.e., T(n) > 1/6 for every n. Bimi- 
larly f M(E, y) dy = 1/6 for all £ and M(E, y) = 1/6, so that there is 


a Yo with M(E, yo) < 1/6 and A(¢) < 1/6 for all £ From corollary 2 


fails to exist at z = yi. Since 
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above, A* = v*, so that their common value is 1/6. Thus neither player 
has a good strategy. 

It is clear that in this example, the players fail to have good strategies 
only because our concept of mixed strategy is too restricted. If we per- 
mit player I to choose a number at random on the unit interval, so that 
the probability that the number chosen falls in an interval (a, b) isb — a 

i 

(0 <a <b < 1), then his expected gain against any y is f M(x, y) dz 
= 1/6, and similarly for player II. Even if the choice of a number at 
random is not permitted as a mixed strategy, it still serves a useful 
purpose in the above example: it permitted us to find the value of the 
game. Moreover, using the fact that a Riemann integral can be approx- 
imated by a sum, we can for any e > 0, find e-good mixed strategies for 
each player in the following way: Since the function M(x, y) is uni- 
formly continuous, for every e > 0 we can find a ô > 0 such that 


| M(a1, y) — M (22, y) | < € 


for all y whenever | ty — T2 | < 6. Choose 7 so large that 1/n < ô, and 
let £ be the mixed strategy assigning probability 1/n to each of the 
points a; = i/n,1<i<n. Then 


12 n Ti ee 1 
MG») = OMe =D f meni- 
1 


where ap = 0. Thus £ is an «good strategy for I. Ina similar manner, 


we get e-good strategies for player II. 
Pursuing the idea suggested by the above example, we define extended 


mixed strategies. 

Definition 2.6.1. Let T = (Z, H, M) be the mixed extension of the 
game G = (X, Y, M), and let T be any function which associates with 
M (cz, y) a real valued function ¢ defined on Y. Then, if for every e > 0 
there is a mixed strategy £ € = such that for all ye F, 


| oy) -ME v) |< e 


we call T an extended mixed strategy. 

We may consider T purely as a convenient mathematical device for 
finding the value of a game and e-good strategies. In games in which X 
tann space, the T’s which are particularly useful, and which are the 
only ones considered in this book, are the density functions on X. 
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Definition 2.6.2. Let X be an n space, and let f be a non-negative 
function defined on X. Then if the integral 


ÍS. f(x) dx 


exists and is equal to 1, we call f a density on X. 


In a game (X, Y, M), a density f on X associates with M (x, y) the 
function 


oly) = MCS, 9) = | M, f) ax 
x 


In order for f to be an extended mixed strategy, we must verify that 
M(f, y) is uniformly (in y) approximable by M (£, y), and this will not 
always be the case. A convenient criterion is given by the next theorem, 
for which we need some preliminary definitions, 


Definition 2.6.3. Let E be a set of real-valued functions Y, each of 
which is defined over a bounded subset A c Sn. If, for every «e > 0, 
there is a 6 > 0, such that, for every subdivision R = (Ry, -- <, Ri) of 


A into sets R;, where L dx exists and the diameter of each R; is less 
Ri 
than ô, we have, for each y 
k 

> [sup V(x) — inf wwf dx < e 

i=l xeR; xe Ri Ri; 
then the set Æ of functions is said to be uniformly Riemann-integrable 
over A. 


Definition 2.6.4. Let M be a real 
xeX,yeY, X C Sn, and let E be 
defined on X such that y e FE if and o 


-valued function of two variables 
the set of real-valued functions y 
nly if, for some yin Y, 


¥(x) = M(x, y) for all x in x 


Then, if E is uniformly Riemann-integrable over a bounded subset 
A C Sn, we say that M (x, y) is uniformly (in y) Riemann-integrable 
over A. 


Theorem 2.6.1. Let G = (X, Y, M) bea game, let X be a subset of 
Sn; and suppose that M (x, y) is uniformly (in y) Riemann-integrable 


over every bounded subset X. 10f X. Then every density function over 


X is an extended mixed strategy for player I. 
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Proof. From the definition of a density on X, it is apparent that, for 
any such function f and any « > 0, there is a bounded subset X, of X 
such that f is bounded on X, and 


(1) fioa: 


Let A be the upper bound of | M(x, y) |, and let X-X; be those points 
of X that are not in X;. Then, from (1) we obtain 


A f(x) dx = A(1 - f 1% dx) = Ae 
X-X: xi 


and, from this, 
f M(x, w(x) dx| < Ac 
X-Xı 


and then, from the definition of M (f, y), 


@) |M, y) — f M, vite) dx| < Ae 
Define 
g(x) = i : 710 for xeXı 
0 for xgXı 


Thus Í g(x) dax = 1, and g(x) is a density on X;. From this definition 
Xi 

and (1) we obtain 

@ |f. M Df) dx — Mow | 
Xı 


|= of a1, VG) dx — M, v) 


ll 


ato,v)| <A fl g(a) dx = 4 
1 


From (2) and (3), by the triangle inequality, we get 
(4) | M(f, y) — MG@, y) | < 24e 
Our aim is to find a mixed strategy ¢ such that, for all y, 


(5) [MG -MEw|<e 
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Because of (4), it will be sufficient to find a £ such that, for all y, 
(6) | Mg, Y) — Mi, y) |< 


Let K be the upper bound of g. From the definition of a density 
function and the hypothesis of the theorem, we see that one may choose 
a subdivision Ri, ---, Ra of X, such that, for all y, 


O Els ace») int arte, nif ae < $ 


xreR; 


Define £() = f o% dx. Let xı, +--+, Xn be any points in Ry, «+, Ra; 
Ine 


respectively, and let § be the mixed-strategy assigning probability (i) 
to X; for i = 1, ---, n. 
Then, for all y, 


(8) M(E, y) < > &(@) sup M(x, y) 
= xe Ri 

and B 

(9) ME v= XA inf M(x, y) 
i=l xeR; 

and M(g, y) satisfies the same inequalities. 

Define 
My) = sup M(x, y), my) = inf M(x, y) 
xe Ri xe Ri 


From (8) and (9) and the fact that M (g, 


y) also satisfies these inequali- 
ties, we infer 


a0) |Ma -MEV sE E@ My) — m:)] 
$ 

Now 

an 10 = foe ax < Kf. ax 


From (10), (11), and (7), we obtain 


| MG, v) = ME v) | < E A) — mo) f ax <e 
1 R; 

which completes the proof. 
Corollary. Let G = (X, Y, M) bea 


subsets of Sm, Sn and M is continu: 
player is an extended mixed strategy. 


game. If X,Y are closed, bounded 
ous, then every density for either 
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Proof. The function M is uniformly continuous on X X Y and 
therefore is uniformly (in y) integrable over X and uniformly (in x) 
integrable over Y. 

An example in which densities are not approximable by discrete mixed 
strategies is the following. Player I chooses a number x with 0 < x < 1, 
and player II chooses a finite set y of numbers (y1, +++, yn), 0 < y: <1. 
If the number chosen by I is in the set chosen by II, the payoff is zero; 
if not, I wins 1 unit. Thus X is the unit interval, Y consists of all finite 
subsets of the unit interval, and M(z, y) = 0 if rey, M(x, y) = 1 if 
x¢y. For any mixed strategy £ for I, M(E, yn) —> 0 asn — ©, where 
Yn is the set of the first n x’s in an ordering 21, v2, ---, of the countable 
x set for which (x) > 0. Thus A(£) = 0 for all ¢, and A* = 0. Also, 
for any n, M(x, n) = 1 for x, outside the countable set which is the 
union of those y’s for which n(y) > 0, so that T(n) = 1 for all n, v* = 1. 
The game has lower value 0, upper value 1. 

However, if f is the density function f(x) = 1 for 0 < x < 1, corre- 
sponding to choosing a number at random in the unit interval, M(f, y) 


1 
= f M(x, y) dx = 1 for all y, since, for a fixed yo, M(x, yo) = 1 ex- 
0 


cept on a finite set. Thus, if we insist that player I can really choose a 
number at random on the unit interval, the game has a value; if we re- 
strict his choice to discrete mixtures over finite sets, the game does not 
have a value. Now the difficulty here is not mathematical; we simply 
have two mathematical models, and the question is which describes 
more accurately a certain situation. In fact, in the general theory of 
games, even the requirement that a function T be e-approximable by a 
discrete mixed strategy, in order for T to be an extended mixed strategy, 
is usually relaxed. 

We turn now to an example of a different kind, a game that simply 
fails to have a value, even if we permit arbitrary probability distribu- 
tions as mixed strategies. The game is this: Players I and II choose 
Positive integers, and whoever chooses the larger integer wins. Thus 
X = Y = the set of positive integers, 


Me, y) = 1 if >y 
= 0 if t=y 


—1 if z<y 


ll 


Now the most general probability distribution over X is a sequence 
£ = (&(1), £(2), ---) with (i) > 0, ZE@) = 1, where £(¢) is the prob- 
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ability that x = i, and similarly for Y. For such a £, we have 


My) = D EO - » li) 
i>y ty 


As y > », M(, y) > —1, so that AQ) = —1, A* = —1. Similarly 
T(n) = 1 for all y and v* = 1. 


2.7. Solving Games 


There are various methods for solving a game, i.e., finding the value 
and the good strategies for both players. In principle, any finite game 
can be solved by solving a finite number of systems of linear equations, 
as will be noted below, but no universally valid method is known for 
solving infinite games. However, if a pair of strategies £*, n* is obtained 
in any way whatever, it is usually easy to evaluate A(E*) and T(n*); if 
A(E*) = T(n*), their common value is the value of the game, and §* 
and 7* are good strategies; even if A(E*) < T(n*), the value is some- 
where in the interval (A(é*), T(n*)), so that we have an upper bound, 
T(n*) — A(E*), on the loss that either player suffers by using &* or 7*, 
Thus, in considering methods for solving games, we need not be unduly 
troubled by the fact that a method cannot be guaranteed in advance to 
yield the solution; if a method often yields strategies (£*, n*) that can 
be verified directly to be solutions, it is worth knowing. There is such 
a method, suggested by the following 


Theorem 2.7.1. If the game G = (X, Y, 


, M) has a value v, and if &* 
is a good strategy for player I and n 


* is a good strategy for player II, 
then 
(a) E(x) = 0 
Iz: MC, n*) <v) 
and 
(b) *(y) =0 


(uz MEF, v) >v) 

; Proof. (We shall prove assertion a only; 
similar.) We have M(é*, y) > v for all Y, 
ever M(x, n*) < v for all x, so that M(é* 

U = {z: M(@, n*) < v}; 
Then 


(1) v 


the proof of assertion b is 
so that M(é*, n*) >v. How- 
19*) =v. Let 


Un = {e: M(x, n*) < v — 1/n} 


ll 


È &@)M(, 7*) 


zex 


DF (@)M (a, o*) + > £*(2)M (a, n*) 
zeC(U,) 


ll 


zeUn 
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z (0 = *) E F@+o D ee) 


NJ eUn ze C(Un) 


for alln. But 


(2) E e@+ È F@=1 
zeUn zeC(Un) 
so that 
68) vsv—+ D pa) 
NreUn 


and hence >> £*(x) = 0. Since this holds for all n, we have 2) £*(x) 
zeUn zeU 
= 0, as was to be shown. 
We remark that, if X is an m space and Y an n space, and if &* isa 
density which is an extended good strategy for player I and n* is a 
density which is an extended good strategy for player II, then 


o) f P@d=o 
{x: M(x, n*) <v) 
and 


@) f voay=o 

ty: (*, y) >v) 
The proof of the theorem has been stated in a way that easily transfers 
to the case of densities. 

The above theorem implies, for example, that, if Y is finite and II 
has a good strategy n* with n*(y) > 0 for all y, then any good strategy 
£* for I will have M(Ẹ*, y) = v for all y. Also, if Y = Sa, and n*(y) isa 
density such that 7*(y) > 0 in some interval In, then M(é*, y) = v for 
all yeJ,. Now in solving a game we will usually not know v, but we 
can often find those strategies for which M(Ẹ, y) is constant and can 
check to see whether ¢ is in fact a good strategy. This procedure is best 
explained by working out a few examples. 

Example 1. I is to look for II in one of the locations A, B, C. II is 
now at A but, at a cost of 1 unit, may move to B or C or, at a cost of 2 
move to D. The cost of moving, if any, is paid to I after 


units, may 
: ves an additional 3 units if he locates IT. 


the search is over, and he recei 
Solution. The matrix is 


ABCD 
A J3 11 2 
B 041 2 
Cc 014 2 
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We look for strategies 7 with M(z, n) = constant; i.e., we solve the 


system 
3n(1) + (2) + n(8) + 2n(4) =k 


4n(2) + (8) + 2n(4) = k 
n(2) + 4n(8) + 2n(4) = k 
nl) + (2) + 7(83)+ (4) =1 


There are two linearly independent solutions m = (1/3, 1/3, 1/3, 0) 
and q = (0, 0, 0, 1), with M(x, mı) = 5/3, M(z, m2) = 2. Thus me is 
not the solution; if m is, we must have M(Ẹ*, y) = 5/3 for any good 
strategy §* for I and y = A, B, or C. This leads to the system 


3&(1) = 5/3 
ECL) + 4&(2) + &(8) = 5/3 
E) + &(2) + 4&3) = 5/3 
E) + &2)+ £3) =1 
which has the solution §* = (5/9, 2/9, 2/9). Thus v = 5/3 and &*, 1, 


are the unique good strategies. 


Example 2. A is to arrive at a station at a time ¢ selected at random 
in (0, 1) and waits there until time 1 or until I or II arrives. Each of I, 
II chooses a time in (0, 1) to appear at the station ; he looks for A and 
departs immediately, with A if A is present. If either I or II succeeds 
in meeting A, he receives one unit from the other. 

Solution. If x, y are the times chosen by I, II, the payoff is 


M(z, y) = x — (y — 1) =22-y for <y 


=0 for tc=y 
(@—y)-y=a-2y for z>y 


The payoff is discontinuous, but we shall assume that the game has a 
value and that any densit 


y is an extended mixed strategy. We shall 
therefore look for a density £ for which M (E, y) = constant. 


1 . e 


Y 1 
-f (@ + y)E(x) dx +f (@ — 2y)E(x) dx 
0 0 
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. PM ae 
If M(é, y) is constant, ae = 0. This yields 
y? 


a g 
_ fyé(y)] + 0) = 0; 2 +3=0; &y) = ky 


1 
Since f æT% dz is infinite, we cannot use a density ka—* throughout 
o 
(0, 1). Since arriving very early is unlikely to succeed anyway, we try 
&,(x) = 0 for «<a 


kaz% for a<a<l 
where 0 < a < 1, and 


Fora <y <1, 
M(Ea, Y) = Qhal(2 — a~4)y — 2a% + 1] 
Thus M (ëa, y) can be made constant for a < y < 1 by setting the co- 


efficient of y equal to zero and solving for a. This yields a = 1/4, and 


hence 
(e) = 0 for x 


< 
je% for $<a<1 
We also have 
M(é€,y) = 0 for $<yS1 
M, y) =4—-2y for ys 


The same density n* for II yields 

M(z, 1*) = 0 for $<2<1 

M(x, n) =22—-4 for s< 
Thus ¢*, 7* are good strategies for I, II; the value of the game is, of 
course, zero. j 

For games in which one player has only two or three pure strategies, 

direct geometric methods are useful. For instance, if player I has only 
two pure strategies, 1, 2, each strategy y of II may be represented by 
the point s(y) = [M(1, y), M(2, y)] in the plane. Let 5 denote the con- 
vex set determined by the points s(y). The point s* in S whose maxi- 
mum coordinate is minimized can be spotted directly; the maximum 
coordinate of s* is the value of the game, and any mixture of y’s for 
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which the same convex linear combination of s(y)’s yields s* is a good 
strategy for II. 

Example 3. I chooses a number į = 1 or 2. A number t is chosen at 
random in (0, 1), and the number z = #ť is announced to II. II then 
guesses which integer I chose; he pays I the amount 0 or 1, according 
as his guess is correct or not. 

Solution. A strategy for II is specified by a subset T of (0, 1) such 
that he guesses 1 when ze T. The payoff is 


Mü, T) =1 -fa 
T 


M(2, T) f a=—f t” dt 
(2, a (t:e T) 2 teT 


(Note that, for a fixed 7 = 1, 2, M is a function whose arguments are 
those subsets of (0, 1) for which the integral exists.) We need not plot 
the set of all points s(T) = (M(1, T), M(2, T)), since only the lower 
left boundary is of interest. For fixed (1, T) = a, the minimum value 
of M(2, T) over T clearly occurs when T is the interval (a, 1) and is 
(1 — a”). The lower boundary of the set of points s(T) is thus the 
curve y =1—2%,0<2<1. This curve is convex and is the lower 
boundary of the convex set S. The point whose maximum coordinate 
is minimum is the ea a of this curve with the line y = x and is 
=i = 0.382 approximately. The value of the 


game is v, and II has the good pure strategy T = (v, 1). The good 
strategy for I mixes 1, 2 in proportion to the coefficients of x and y in 
the equations of the tangent line to y = 1 — x” at (v, v). This equa- 
tion is x + (1 — ¢)y = v, where we write. = (¢, 1 — ¢) and 


(v, v), where v = 2 


-% 


v 


erez 


= 5-4 = 0.447 
approximately, so that I mixes 1, 2 in the proportions (0.447, 0.553). 


In games in which M(z, y) is convex in y for fixed x, the simplest 
procedure is often a direct evaluation of T(y). 


Example 4. X = Y = unit interval; M(x, y) = Jx—yl. Ty) = 

max (y, 1 — y), v* = min f(y) = T(1/2) = 1/2. Thus v = 1/2, and 
Y 

y = 1/2 is a good strategy for II. Since 0 and 1 are the only pure 


strategies x for I with M(x, 1/2) = 1/2, any good strategy ¢ for I is a 
mixture of 0 and 1. If (0) =ç, 
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MG, y) = ty + 0-90-49) A) = min (ç, 1 — §) 


Thus A(¢) = 1/2 only if ¢ = 1/2; the only good strategy for I mixes 0 
and 1 equally. 

In finite games there is a constructive method for obtaining the ex- 
treme points of the set =* (H*) of good strategies for player I (II) 
which is given below. It turns out that the set =* (H*) is the convex 
hull of a finite set of extreme points, so that, from Theorem 2.2.4, this 
method yields an explicit parametric representation of =* (H*). 

First we define simple solutions of arbitrary games. 


Definition 2.7.1. Let G = (X, Y, M) be a game, and let &* and 7* be 
mixed strategies such that 


M(E*, y) = M(x, n*) = constant 


for all z in X and y in Y. Then we call the couple (*, n*) if it exists, a 
simple solution of G. If G isa finite game with matrix A, we call (&*, n*) 


a simple solution of A. It is obvious that the constant is the value of 


the game and that &* and n* are good strategies. 


Theorem 2.7.2. Let A = (aij), % j=, +++, n be a nonsingular 
matrix and let A = (Az) be the adjoint of A. Then a necessary and 


sufficient condition that A have a simple solution is that the quantities 


all be of the same sign. 


Proof. (Sufficiency.) Assume that R; and C; are of the same sign, 


and consider the linear equation 


(1) AG) = Dak = (j =1, n) 
i=1 


where v is a nonzero constant of value not yet specified. Using Cramer’s 


rule, the solution of these equations is 


Ri no oe, 
(2) = Gabe , n) 
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where a = | A P the determinant of the matrix A. Now v can be chosen 
so that 
z È R: 
; dl 
(3) Di =v =i 
i=1 a 
that is R 
a a , i 
(4) v= (i) = 


E IA; 
> Ri J J 

Since we assume that all R; are of the same sign, then, by (4), £(z) > 0 
for all ¢ and, in view of (3), £ = (E(1), ---, (n)) e Z. 


In a similar way we conclude that q = (n(1), +++, a(n)) £ H, where 
1(j) = C;/ZZA;; is a solution of the equations 


(5) Ali, n) = D aim) =v, @ = 1, +++, n) 
j=l 
But, as above, 
(6) j a a 
v= = =v 
z 2DA;; 
XC; 
j=l 


so that A(E, j) = A(i, n) = v for alli and J 
(Necessity.) To prove the necessity, we are given 


O LaO =v, G=1, n); O20 Dey =1 


i=1 


ÈD aili) =” (i=1,---, n); n(i) > 0, Dal) =1 
Bh j=1 


and, consequently, 
Ri C; 
8) W= e ya 
DZA; 1O) DZA; ° DTA; 
and, since &(7) > 0 and (3) > 0, we must have all Ri and all C} of the 
same sign as )) > Ajj. 


i=1 j=1 

Note that, if A is nonsingular and admits a simple solution, the good 
ia for the two players and the value of the game are given by 
8). 


a 
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Corollary. If a nonsingular square matrix admits a simple solution, 
then the solution is unique. 


Proof. The proof follows from the fact that, since matrix A is non- 
singular, the sets of equations 


DX atli) = v; È amli) =v 
jal 


i=l 
have a unique solution. 


Theorem 2.7.3. Let A be an m X n matrix, such that the value v of 
the game with matrix A is not zero. Let =* (H*) be the set of all good 
strategies for Player I (II) in the finite game whose matrix is A. Then 
a necessary and sufficient condition that a §* in =* and an y* in H* be 
extreme points of Z* and H*, respectively, is that A contain a non- 
singular submatrix B such that (§*3, n*s) is a simple solution of B— 
where &*, is the vector obtained from §* by deleting the components 
corresponding to the rows deleted to obtain B from A, and 7*g is the 
vector obtained from * by deleting the components corresponding to 


the columns deleted to obtain B from A. 


Proof. (Sufficiency.) We assume that there is an r X r nonsingular 
matrix B in A, such that (Ẹ*g, n*s) is a simple solution of B, and show 
that this implies that Ẹ* and ņ* are extreme points of Z* and H*, re- 
spectively. Assume the contrary; then there exist &, £” © =* with 
& = E” such that 


(1) Et = AE + (1 aE” 


0<X <1. Let Js represent the row subscripts and Jg the column 
subscripts of B. Then, by hypothesis, 


(2) v = B(E*s, j) = BEB, J) + (1 — )BE"s, j) 


for alljeJp. Butv > B(&'s, j) and v > B(§"s, j) for j e Jz, and hence 
it must follow that 


(3) v = Bes, J) = BE"s, j) for jee 


and (€'g, n*z), (£2, n*s) are simple solutions of B, so that by corollary 
to Theorem 2.7.2 


(4) Eg = Ep 


Since by hypothesis (§*z, 1*s) is a simple solution of B, the components 
of £* corresponding to rows of A not in B are all zero. Hence we con- 
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clude from (1) and (4) that 


65) E = E” 


which contradicts the assumption that & = £”. The proof that ņ* is an 
extreme point of H* is similar. 

(Necessity.) Let Jı and Iz be the sets of row subscripts of A such 
that &*(7) > 0 for i e I, and A (i, n*) = v fori e I where v is the value of 
the game. Similarly let Jı and Jz be sets of column subscripts of A 
such that n*(J) > 0 for j e Jı and A(§*, j) = vforjeJs. Then IoD qh, 
J2 Jı. Now in the submatrix A, of A with i e Jo, j e Jo, the row vec- 
tors r; for te I4, are linearly independent; for suppose there exists a 
non-zero vector t with t-c; = 0 for all jeJs, where t; = 0 for igl, 
and where cy is the jth column vector of As. Then, since 


1 
I 
(6) È n*c = v8, ô = 
jeJa 
1 
we have 
(7) 0=t X MGc) = olta’) 
jeJ: 
where for any column vector w, w’ represents the corresponding raw 
vector. 
Since v = 0, t-8’ = 0. Thus 
(8) E9 cpie 


is a mixed strategy for | € | sufficiently small, Moreover, for all je Jo, 
9 A(E 3) = er 5 is 
( ) È ,J) È a3;(€*(2) + eti) =v+e x ait; =v 


iel 


and, since A(E*, j) > v for jgJ AE, j ; 
$ , 2, J) > 0 for 7 ¢ J» and | e| suffi- 
ciently small. Thus there is an ey > 0 for which £ and gya good 


strategies for player I, with &* = 2(8 + Ea) contradicting the 


hypothesis that §* is an extreme point in =*. Similarly, the column vec- 


tors cj of Az are linearly independent for jeJ}. Let r be the rank of 
Ag. Then 


(10) 72 max [N(7;), N(J,)] 
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where for any finite set S the symbol N(S) stands for the number of 
elements in S. Thus there exists an r X 7 nonsingular submatrix B in 
Ag with Ig D Iı, Jp D Jı, where Ig and Jg are the sets of i, j determin- 
ing the matrix B. We also have 


(11) B(E*n, j) = BG, n*s) = v 


for i e Ip, j e Jg, which completes the proof of the theorem. 

The restriction in Theorem 2.7.3 that v Æ 0 is unimportant, since by 
adding the same constant to all elements of A we obtain a new matrix 
all of whose elements are positive, and its associated game has exactly 
the same sets of good strategies as that of A. Thus Theorems 2.7.2 and 
2.7.3 give us a fairly complete theory for the solutions of finite games. 
We examine all square submatrices of A for simple solutions; that is, 
we test them to see whether all R; and C; are of the same sign. For 
each submatrix where this is true, we compute the simple solutions 
according to the method of Theorem 2.7.2. We then try these solutions 
to see if they determine good strategies for both players, i.e., whether 
A(E*, j) < v for all j and A(i, n*) 2 v for all t. Tf the answer is in the 
affirmative, we obtain two extreme points, one in =* and one in H*. 
Once all the extreme points have been thus obtained, we can then give 
a complete characterization of Z* and H*. f 

We have seen in Chapter 1 that in games of perfect information the 
good strategies and value may often be obtained by an inductive process. 
The same method may sometimes be applied with success to games that 
cannot be strictly classified as games of perfect information. An inter- 
esting instance of this is given in the following example: oe 

Example 5. In an archery contest, two contestants of equal skill (i.e. 
equal accuracy) approach their respective targets from the same dis- 
tance and with the same speed. Contestant I has m arrows, contestant 
II n arrows; also, each knows how many arrows the other has and when 
his opponent shoots. We assume that at the start the probability of 
hitting the bull’s eye is a, where a <1/(m +n), and increases to 1 
monotonically as the two contestants approach their targets. We can 
describe the location of a contestant by the probability of a hit. If I 


scores before II, then II pays I one unit, and the game is over. If II 


scores before I, then I pays II one unit, and the game is over. If both 


contestants exhaust their arrows without scoring, or if both score si- 
multaneously, then the payoff is 0. We denote this game by G (a, m, n). 
The problem is: At what points x should contestant I shoot his arrows, 
and at what points y should contestant II shoot his arrows, provided 


neither has yet scored? 
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Solution. Let us first consider the case m = n = 1, i.e., the game 
G(a, 1, 1). The payoff function M is 


(1) Miz, y)=x—-—(l1—2)y if <y 
0 if w=y 
(l-ye-y if y>s 


It is clear that I gains by waiting, so long as II does not shoot. How- 
ever, if I waits past x = 1/2, II may shoot before I and make the ex- 
pected payoff negative. This suggests that the best I can do (if II has 
not shot) is to shoot at x = 1/2, which yields an expected payoff of 0, 
and II can prevent him from gaining more by also shooting at that dis- 
tance. We easily verify in fact that x = y = 1/2 is a saddle point; i.e., 


(2) M(x, 1/2) < M(1/2, 1/2) < M(1/2, y) 


for all z and y, and thus x = 1/2 is a good strategy for I, y = 1/2 isa 
good strategy for II, and the value of the game is of course 0. 

For arbitrary m and n, neither player has a good strategy in G(a, m, n) 
but both have e-good strategies [see Definition 2.3.1]. In order to under- 
stand how to “guess” a correct solution for the general case, we suggest 
that the reader work out the case for m = 1, n = 2 and then generalize 
on these results. We begin the study of the general case by describing a 
certain strategy for each player in G(a, m, n). 

For any e > 0 and for m and n positive integers, we describe induc- 
tively the designated strategy S,(e, m, n) for player I in G(a, m, n). 
For I to use S;(e, 1, 1) in G(a, 1, 1) is to play as follows: If II does not 
shoot before y = 1/2, then I shoots at x = 1/2; if II shoots before y 
= 1/2, then I shoots at x = 1. To use S,(e, m, n) in G(a, m, n) is for I 
to play as follows: 


Case 1. m>n. (a) If II shoots at y < 1/(m + n), then I uses 81(e, 
m, n — 1) in the game G(y, m, n — 1). 


1 
(b) If II does not shoot before aE player I shoots at a = 7 and 
then uses strategy Sı (e, m — 1, n) in the game G ( £ m—1, n) or 
m+n’ 


r 1 
Si(e.m—1,n—1)inG a par 1, n — 1 ) according as II does 


not or does shoot at 


m+n 
Case 2. m <n. I chooses 6 such that 
2 — 
(3) è= min K oes De 1 ] 
2m (m + n)(m +n — 1) 
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and uses the rectangular distribution over the closed interval é y 7m? 


Paa + °) to pick a point To in this interval before each play of the 


game. 
(a) If II shoots at y < to, then I uses S;(¢/2, m, n — 1) in the game 
Gy, m, n — 1). 


(b) If II does not shoot before zo, I shoots at xo and then uses S(¢/2, 
m — 1, n) in the game G(x, m — 1, n). (Of course, if II shoots at zo, I 
uses S(¢/2, m — 1, n — 1) in G(x, m — 1, n — 1), but the probability 
of this happening is zero.) 

An edesignated strategy for player II is described in a similar way 
with m and n interchanged. 

We now prove that for every «> 0, the e-designated strategy for 
player I (player II) is an «good strategy for player I (player IT) in the 
game G(a, m, n) and that the value of G(a, m, n) is (m — n)/(m + n). 

We introduce the notation E(e, m, n) to designate the least amount 
that player I can expect to get if he uses the e-designated strategy 
Si(e, m, n) in the game G(a, m, n). Since the proof for player II is ex- 
actly similar to the proof for player I, it will be sufficient to prove that, 
against any strategy adopted by Il, 

m—n 


m+n 


m-n. We see from (1) and (2) 
d, a fortiori an e-good strategy in 


-=g 


(4) Ble, m, n) 2 


The proof proceeds by induction on 
that S,(¢, 1, 1) is a good strategy an 
G(a, 1, 1), and thus 

E(e 1,1) >0-«€ 
We assume that the theorem is proved for m +n < k, and prove it for 


mt+n=k. 
Case 1. m>n. (a) II shoots at y < 1/(m + n). 


Ele, m, n) > —y + (0 — DEC m n= D) 
= E(e m, n — 1) — y(l + Ble, m, n)) 
= 1 
> EL fo) 
= m+n- 1 m+n mtn—1 
m—n 
ee 
(b1) II shoots at y = 1/(m +7) 


— E 
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Ele m, n) > (1 — y) — yl — x) + (1 — z) — y)E(e,m — 1,n — 1) 


1 ae m—n ) 
= € 
>(1 m+n mtn—2 
n=") (m+n -— 1} picon 
€ 
6 +n/ (m+ n)(m +n — 2) m+n 
(by) II shoots at y > 1/(m + n). 


m 


Ele, m, n) > z + (1 — x)E(e, m — 1, n) 


1 1 mm =n— I m=n 
> H(i ej] > — e 
“m+n m+tn/\n+n—-1 m+n 


Case 2. m <n. (a) II shoots at y < 1/(m + n). 


E(e,m,n) > —y + (1 — y)E(€/2, m, n — 1) 
Sia 1 (ees 9) 
m+1 m+n m+n=1 2 
m—n 


mtn 
(b) II shoots at yo = [1/(m + n)] +6,0 <A <1. 


1 vo 1 vo € 
E(e,m,n) zalf dr +(1 -=f za) #(é, m= 1, n)| 
AG Jur V6 Jy0—r5 2 


+0-»[-v+a-e(S m,n —1)| 
> <+a{( : *) 
2 mau E 
+(1 _ 1 *) (2 ‘)] 
m+n 2/\m+n-—-1 


+a-»[- 


=e 


— 6b 
mtn 


( 1 eae! 
fi =i )| 
m+n mtn-1 
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e m—n =n- 
ke | (1-2 n -) 
2 m+n 2 m+tn—-1 
m—n+1 
- a-h). 
m+n-—l 


To complete the proof of case 2(b), we need only show that, for all A 
with0 SA <1, 


a? m—-n—-1l m m—nt+1 ë 
—(1- —(’—»’) 14 2—"** Ns -5 
2 mtn-1 mtn—-1 D) 


It is easy to show that the coefficient of ô is minimal at 


A = m/(2m + n) 
Consequently, it suffices to prove that 
m? m-—n-—1 
.2(2m + n)” m+n-1 
1 m—-n+l 
-( m \(1- n \(i+ Vise —< 
2n +n Q2n+n mtn—-l 2 
Rearranging and factoring (5), we obtain 


2 
m ) ( : ) em + m6 2 ae 
mtn—-1 2 


2m +n 


and, using (3), we get the desired result. 
(c) II shoots at y > 1/(m + n) +o = Yi 


ire A ax) E($ -1 
E(e, m, naif zat (1-3 he s Sie yt 


1 3 1 -\( 3) 
> wet limos a/\m+tn—1 2 


—~ m+n 
,—=n— 1 m= 
aor yia e 
m+n 2 2 m+n-—1 m+n 


which completes the proof. 
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PROBLEMS 


2.7.1. Solve the games with the following matrices: 


(a) J] 2 0 0 (b)}1 2 3 4 
0 bo 23 #21 
00 c n 1 2 

4123 

(ce) 5 382 1 (d) aj =i =j 47=1,2,+++,n 
1 41 0 (e) ag =1 if |i—-j|=0 or 1, 
-3 23 -1 j=l, n 
—2 -1 3 2 aij =0 otherwise. 
-1 01 2 


2.7.2. Solve the following game: 


X=O<2<l, Y=(0<y<1), M(z, y) = ¢le/y) 
where g(t) = 1 if the first non-zero digit in the decimal expansion of ¢ is 1, g(t) = 0 
otherwise. 

2.7.3. Solve the following game: 


X=0<2<1, Y=O<y<, Mz, 1) = |z- y|- -y 


2.7.4. In Example 5 of this section, solve the game for m = n = 1 when (a) the 
two contestants are separated by a partition so that neither knows whether the 
opponent has shot his arrow, (b) contestant I has this information but not contestant 
II. Assume in either case that the probability of hitting a bull’ 


s eye at the start is 
zero. Also, assume in part b that contestant II will delay his shot until the end with 
positive probability. 


CHAPTER 3 


General Structure 


of Statistical Games 


3.1. Introduction 


In the previous two chapters we considered general two-person, 
zero-sum games defined by a triple (X, Y, M ) where X and Y are the 
Spaces of pure strategies for the two players and M is a real-valued func- 
tion defined on X X Y and represents the payoff to one of the players. 
What characterizes a particular game, or & given class of games, is the 
Structure assigned to X and Y and the form given to M. One kind of 
Specialization of these spaces and payoff functions leads to a class of 
games known as statistical games. The study of such games will be the 
concern of the remainder of this book. 

In statistical games the two players will be referred to as nature and 
the statistician. Nature will be designated as player I and the statisti- 
Cian as player II. The fact that, in statistical games, nature cannot be 
Considered as a conscious opponent who can take advantage of mistakes 
made by the statistician raises the important and as yet unsolved prob- 
lem of what constitutes a tational way of playing games of this type. 
Various aspects of this problem will be considered in Chapters 4 and 5. 

he existence of such problems does not, however, in any way change 
the basic fact that statistics can be viewed as a game against nature, 
and this viewpoint has already led to important advances in statistics. 

In this chapter we will not deal with any specific class of games of a 
Statistical nature but rather will attempt to describe certain general 
features common to all such games. We will also introduce some basic 
Probability concepts which are essential to the understanding of the 
Structure of such games. The study of special classes of statistical games 
Which arise in a variety of statistical-decision situations will be under- 
taken in the last six chapters of this book. ; 

Before we embark on a discussion of statistical games, we introduce a 
ew mathematical notions and notations which will be employed in this 


and subsequent chapters of this book. 
75 
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Let y be a function which is defined on a product space A X B. 
Then by gp, for b e B, we mean the function defined on A such that 


pela) = g(a, b) 
for all a in A. ; ie 
If y and f are functions such that the domain of definition of con- 
tains the range of f and the domain of f is a space Z, then by the compo- 
sition of gy and f, written y O f, we mean the function h such that for all 
zeZ 
h) = off@)] 


By a partition of a space Z we mean a division of Z into mutually exclu- 
sive sets whose union is Z. A partition will be designated by the symbol 
8 and a set of partitions by the symbol ©. A function f defined on Z 
determines a partition Sy of sets Sa defined by 


Sa = {z: f(z) = a} 


where a is an element in the range of f. A 
Let Y be any space and y any element of it; then by the symbol {y}- 
we shall mean the set in Y consisting of the single element y. However, 


if y has a subscript or superscript, this symbol will designate a sequence 
of elements of Y. 


3.2. The Sample Space 


In the background of each statistical game there is postulated a 
sample space which describes all conceivable outcomes of an experiment 
that the statistician can perform. The possibility of “spying” on the 
opponent by performing experiments is a distinguishing characteristic 
of all statistical games. The manner in which this spying is done will 
form a basic part of the statistician’s strategy, as will be described later. 

The structure of the space Z of outcomes of an experiment can be 
simple or exceedingly complex. An example of a relatively simple 
space is one in which the elements are the 2” possible outcomes of n 
tosses of a coin. Each point in this space is a sequence of n “heads” 
and “tails” of the form HTH --- T. An example of a complex space 
is one in which the elements are possible temperatures that can be re- 
corded during a 24-hour period in a given locality. A point in this space 
is a function z with values z (t) representing the temperature at time t. 

A set S of points of the space of outcomes is called an event, Thus, in 
the coin-tossing experiment, the set of he 
tains exactly m heads and n — m tails is an event. In the temperature- 


points each of which con- 
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recording experiment, a possible event might be the set of all functions 
z with max z(t) < 80° F. 


t 
Associated with the space Z is a parameter space Q with elements w, 
and a function p defined on Z X © such that, for a fixed w, Pa is a prob- 
ability distribution on Z; i.e., po is a non-negative function defined on 
Z with p.(z) = 0 except on a countable set, and x F= 1. [See 


Definition 1.8.1.] The set 2 can also be fruitfully aons dered as an index 
set for the class of probability distributions on Z. The elements of Q 
need not be real numbers or vectors, although in many statistical prob- 
lems Q is, in fact, a subset of n space. 

We now formally define the notion of a sample space. 

Definition 3.2.1. Let Z and Q be two non-empty sets, and let p be a 
function defined on Z X Q such that, for a fixed w £ 9, p, is a probability 
distribution on Z. Then the triple Z = (Z, 9, p) is called a sample 
space. 

Though formally the sample space Z is defined as a triple, we do not 
always make the distinction between it and the first element of the 
triple. Thus we speak of an event as a set in the sample space when in 
fact it is a subset of Z. This ambiguity will cause no difficulty and 
helps to emphasize that an event is associated with a set of probabilities, 
one for each we. For any w £Q and any event S C Z, the probability 
of the event S is given by 


PAS) = È pal) 


From the point of view of statistical games, the elements w in Q 
constitute the pure strategies for nature. Thus the space 2 corresponds 
to the space X in a general game. An element in 2 will sometimes be 
referred to as a stale of nature. For any w €Q, the value of the function 
Po Will be denoted by pa(z) and also by p(z | w). We define 


Po = {Po:w E9} 
It is easily verified that for every S C Z and every we 9 
(a) 0<P.(S8) <1 
(b) If SCT, then P.(S) < P.(T) 


(e) P.(S) + Pa(C(S)) = 1 
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where C(S) stands for the complement of the set S, and 

(a) P.(S U T) < Po(S) + P.(T) 

with equality holding if S and T are disjoint. We also have the following 


Theorem 3.2.1. Let Z = (Z, 9, p) be a sample space. For every w 
in Q and every sequence of disjoint events S1, So, --- 


Proof. a (Ù s) a x Pa(Si) 


P.(U 8) = Z n= E X ne = EPS) 
i=l zeU si i zes; i 


3.3. The Space of Pure Strategies for the Statistician 


The structure of the space of the statistician’s pure strategies is some- 
what more complex than that for nature. To begin with, the statistician 
has at his disposal a class A of possible actions which he can take (or 
decisions he can make) in the face of the (to him) unknown state of 
nature w. If he decides to take an action without experimentation, we 
assume that he incurs a numerical loss L(w, a), a known function of the 
state of nature w and the action a which he selects from A. However, 
the possibility of performing experiments and thus reducing the loss by 
gaining at least partial information about w is open to the statistician. 
The possibility greatly enlarges the class of strategies that he can 
employ. He must now decide which experiments he is to perform, in 
what sequence he is to perform them, when he is to terminate experimen- 
tation, and what action he is to take once experimentation is terminated. 

What prevents the statistician from getting full knowledge of w by 
unlimited experimentation is the cost of the experiments. This cost 
may depend linearly or otherwise on the number and kind of experi- 
ments he performs, or it may also depend on the outcome of the experi- 
ments. However this cost is measured, it is an essential factor which 
enters in his choice of a strategy. 

What then is a strategy for the statistician? Before we can answer 
this question, we must introduce the notion of a random variable and 
the new sample space to which it gives rise. 


3.4. The Notion of a Random Variable 


Previously it was stated that in the background of each statistical 
game there is a sample space Z. Because of cost considerations and 
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other reasons which will be made clear in Chapter 8, the experimenter 
often will not observe the elements of Z directly but rather a numerical 
or vector-valued function defined on Z where by a vector-valued function 
is meant a function whose values are points in n space. Thus, for ex- 
ample, in the coin-tossing experiment, the statistician may not observe 
the actual sequence of heads and tails but rather the values of the func- 
tion f where f(z) represents the number of heads in the sequence, or the 
values of the function g, where g(z) represents the number of runs in 
the sequence, or the values of the vector function h, where h(z) = (f(z) i 
g(z)). Similarly in the temperature-recording experiment he may ob- 
serve not z(t) but rather the values of the function f, where 


1 fe 
J 2(t) dt 
to = ti 4 


is the mean temperature in the time interval (tı, t2); or the values of the 
function g, where g(z) = max z(¢)—the highest temperature in a 24-hour 
t 


Je) = 


period; or the values of the vector function h, where 

h(z) = (2(1), 2(2), +++, 2(24)) 
and where z(j) is the temperature reading at the jth hour of the day. 
The functions f, g, h of these examples are random variables. 

Random variables are usually thought of as being numerical or 
vector-valued. In many statistical situations, however, a more general 
point of view is desirable. Thus, for example, a rule that assigns to 
each outcome of an experiment a decision, out of a class of possible 
decisions, is clearly a random variable in the same sense as the above 
functions f, g, and h, even when the decisions are not expressed as 
numbers. Thus we are led to the following formal definition. 


Definition 3.4.1. Let Z = (Z, 2, p) be a sample space. A function 
defined on Z is called a random variable. 

Let f be a random variable. Then f generates a new sample space 
X = (X, 9, g) where X is the range of f and gw is the probability distri- 
bution of f induced by py. (The class of probability distributions qu for 
each w in Q will be denoted by Qa.) Thus we see that a choice of a 
random variable f is equivalent to a choice of a sample space X which 
in turn is equivalent to a choice of an experiment whose outcomes are 
the elements of X. The meaning of the symbol g will be made more 
precise by Theorem 3.4.1 and the definitions that precede it. 


Definition 3.4.2. Let Z = (Z, Q, p) be a sample space, let fi, ---, fm 
be random variables defined on Z, let the sets By, ---, Bm be the ranges 
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of fı, +*+, fm, respectively, and let B be the product space of By, +- +, Bm. 
Then for any w in Q the function qw, such that for every b in B 


qu(b) = qu(bi, +++, bm) = Pofz: fi) = bi, +++; f(z) = bm} 
is called the joint probability distribution of fı, ---, fm, given w. 


Definition 3.4.3. Let Z = (Z, 9, p) be a sample space, and let f be a 
vector-valued random variable defined on Z. Then for any w in Q if 


the series 
È IIO |p.) 
zeZ 

converges, the vector 


xs @)pa(2) 


is called the expectation (mean or expected value) of f, given w (where 
as in Chapter 2, | f(z) | stands for Vf.2(2) + --- + fi2(2) and f,(2), 
+++, f(z) are the components of the vector f(z)). This vector is denoted 
by E.(f). 


Theorem 3.4.1. Let Z = (Z, 2, p) be a sample space, let f be a ran- 
dom variable defined on Z, and let ¢ be a vector-valued function whose 
domain of definition includes the range B of f. Then, for any w in Q, 


E.(g Of) = È e)a) 


where qu is the probability distribution of f given w. 
Proof. 


E.(¢ of) = X o(f@))pole) 
=Xe) È pl) =D vad) 
deB { } beB 


z: f(z)=b 

Theorem 3.4.1 asserts that it is sufficient to know the probability distri- 
bution of f to compute the expectation of any function of the values 
of f. In particular, if g is the characteristic function of a set S in X , Le, 
glf()] = 1 if f(z) eS and gff(2] = 0 if f(z) ¢8, the probability of an 
event S can be computed without any reference to the space Z. In 
subsequent discussions it will entail no loss of generality and it will 
sometimes be convenient to take as our sample space the triple X = 
(X, 2, g). When f is vector-valued, which is often the case in statistics, 
its range X is a subset of n space. 

As a final remark, we wish to point out that any random variable i; 
and hence any experiment, may be viewed as a partition $y of the space 
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Z into sets S, defined by 


S: = {2:f@ = 2} 
where xe X. 


3.5. Statistician’s Space of Strategies for Single Experiments 


We now turn to the problem of characterizing the structure of the 
space of strategies for the statistician. For expository reasons it will be 
convenient first to treat the special case where the statistician is re- 
stricted to performing a single experiment. Such an experiment might 
consist of firing N bullets at a target and observing the distance of each 
shot from the point of aim, planting k varieties of wheat on r plots each 
and measuring the mean yield per plot for each variety, counting the 
number of bacteria on each of N Petri dishes, measuring during a period 
of a year the weekly gain in weight of N animals which were fed a special 
diet, and so forth. A single experiment usually consists of a finite num- 
ber of subexperiments, and the outcome of a subexperiment may itself 
be a vector with a fixed number of components. What makes an ex- 
periment a single experiment is the fact that a prescribed number of 
subexperiments are performed. A single experiment is often referred to 
as a fixed sample-size experiment, and the number of components in the 
resulting vector is called the size of the sample. 

As was previously indicated, the space of pure strategies for the 
statistician in a statistical game with no experimentation is simply a 
space A whose elements are the possible actions that he can take. In 
a game with a single experiment, the number of strategies for the statis- 
tician becomes vastly increased, for he must now decide on a rule that 
will associate with each possible outcome of the experiment a point a 
in A. Such a rule is a decision function, which is defined as follows. 


Definition 3.5.1. Let Z = (Z, 9, p) bea sample space, and let A be 
an arbitrary space of actions or decisions. Then a function d which is 
defined on Z and which maps Z into A is called a decision function. 
The class D of decision functions is the class of pure strategies for the 
statistician. 

We remark that a decision function d in D is a random variable, and 
it can be considered as a partition of the set Z into mutually exclusive 
subsets 

Sa = {z:d(z) = a} 


whose union is Z. If the outcome of a single experiment is an element 
in Sa, action a is taken. Thus, for example, if A consists of two elements 
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a; and ag, then a strategy d is a division of Z into a set S, sometimes 
called a critical region, and its complement C(S) such that, if z e S, action 
a, is taken, and, if z e C(S), action az is taken. 

In order to define a statistical game with a single experiment, we need 
to introduce only two more concepts, that of a loss function and that 
of a payoff or risk function. These two concepts, as well as those already 
defined, are illustrated in the example following Definition 3.5.4. These 
concepts will be treated in greater detail in the next chapter, but formal 
definitions are given here for the sake of completeness. 


Definition 3.6.2. Let Z = (Z, Q, p) be a sample space, and let A be 
an arbitrary space of actions. Then a bounded numerical function L 
defined on the product space 2 X A with values L(w, a) is called a loss 
function. 


In the definition of risk function we use the composition of the func- 
tions La and d, where L is a loss function and d a decision function. 


Definition 3.5.3. Let Z = (Z, 2, p) be a sample space, let A be an 
arbitrary space of actions, let D be a class of decision functions mapping 
Z into A, and let L be a loss function defined on Q X A. Then the pay- 
off or risk function is the function p defined on 2 X D such that 


p(w, d) = 2 L(w, d@))pa(2) 


We remark that, since the loss function L is bounded, the risk function p 
is bounded, and the number p(w, d) always exists. 

In view of the foregoing discussion, we are now in a position to give a 
general definition of a statistical game with a single experiment. 


Definition 3.5.4. Let Z = (Z, 9, p) be a sample space, let A be an 
arbitrary space of actions, let D be a class of decision functions mapping 
Z into A, and let p be a risk function defined on Q X D. Then the game 


G = (Q, D, p) is called a statistical game with a single experiment (or a 
fixed sample-size game). 


The following is an illustration of a statistical game with a single 
experiment. 

A chemist desires to estimate the average number of inert particles 
per cubic centimeter of a liquid produced by a chemical process. It is 
assumed that the number z2 of particles in any volume v of the liquid 
produced by the process has a Poisson distribution. That is to say, 


(COK 


(1) p(z | w) Ta EREA 
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where w is the expected number of particles per cubic centimeter of the 
liquid. The chemist decides to base his estimate on the number of 
particles he finds in a microscopic examination of a specified volume v 
of the liquid. He measures the loss by a multiple of the square of the 
difference between his estimated value of w and the true value of w 
which is unknown to him. 

In this example, the space Z consists of all non-negative integers. 
Nature’s space of pure strategies 2 is the non-negative half of the real 
line. The sample space is the triple (Z, 9, p) where p is given by (1). 
The space of decisions A consists of all real numbers a with 0 < a < œ, 
For any a and v, the loss function L is defined by 


(2) Llo, a) = kw — a)? 
where k is a positive constant. A decision function d is a rule, which to 
each possible number z of particles counted in a selected volume v 
assigns a number a in A which the chemist is to consider as an estimate 
of w. Thus a possible decision function d is 
(3) d(z) = z/v 

For any decision function d, the risk function p for this problem is 
given by 


ie a (wr) em 
(4) p(w, d) =k >= [o — d(z)]? a 
z=0 i 
If, for example, d is defined by (3), then 
k 2 (wr) zgor kw 
p(w, d) = a x (wv — z)? = m 


which is linear in w. ; 
We observe that the loss function L defined by (2) is unbounded so 


that p(w, d) may not exist for all d. In many estimation problems [see 
Chapter 11] the requirement that L be bounded will often be replaced 
by the weaker requirement that L be bounded from below, i.e., that 
there exist a positive constant K such that L(, a) > —K for all a 
and w. 


3.6. Mixed Strategies for Single-Experiment Games 


Having defined a single-experiment game G, the definition of the 
mixed extension T = (Z, H, p) of G presents no difficulties and is, in 
fact, the same as that given in Definition 1.8.2 for general games. There 
is, however, a difference in nomenclature between general games and 
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statistical games. A mixed strategy for nature is, in statistical language, 
often called an a priori probability distribution for the states on which £ 
is defined, and the space = of mixed strategies is called the space of a 
priori probability distributions. Another concept which plays an impor- 
tant role in statistical games is that of an a posteriori probability distri- 
bution. To define a posteriori probability and the related concept of 
the a posteriori risk we need to introduce the more general notion of 
conditional probability and conditional expectation. 


Definition 3.6.1. Let S be an event in a sample space Z = (Z, Q, p) 
and let f be any vector-valued random variable defined on Z. For any 
w gQ for which P,,(S) > 0, the vector 


S@)pe(2) 
3.6.1 a 
Pa ZRO 
is called the conditional expectation of f, given S and w, provided 
> |E) |po() is finite, and is denoted by Z.(f| S). If f is the charac- 
zeS 


teristic function of a set T, the number Eo (f | S) is called the conditional 
probability of the event T, given S and w, and is denoted by P(T | 8). 


In statistical terminology, the quantity P.(7’ | S) is the probability 
that the outcome of an experiment will be a point in a set T in Z, given 
that the outcome is a point in a set S in Z and given that nature is in 
state w. Note that P.(T |S) exists for all w such that P.(S) > 0. Also 
PoP | S) = 0 unless T N Sis not empty. If T N Sis not empty, then 


> EZO 
3.6.2 P(T |S) = = 
86.2) os 
and, if T = S, Py(S | S) = 1. The above remarks are summarized in 
the following theorem: 


Theorem 3.6.1. Let S be an event in a sample space Z = (Z, 9, p), 
and let Qs consist of all w e Q such that P.(S) > 0. Assume 9s is not 
empty. Let p*.(z) = Po(z | S) for all we Qs. Then Z* = (Z, Qs, p*) is 
a sample space. If expectations on the new sample space are denoted 
by E*, then Z*,(f) = E(f | S) for all random variables f. 


The above theorem asserts among other things that conditional ex- 
pectations are simply expectations for a new sample space so that all 


theorems about expectations are valid also for conditional expectations. 
Thus, for instance 


Elf + f2) | S] = Eol | S) + Eu(fe | S) 
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Definition 3.6.2. Let Z = (Z, Q, p) be a sample space. For any par- 
tition $ of Z, any w £ 9, and any random variable f, the random variable 
h which for each z £ S e 8 has the constant value 


he) = E.(f| S) 


is called the conditional expectation of f with respect to the partition $ 
and is denoted by Eu(f| 8). 

Note that Z.,(f | 8) may fail to exist for either of two reasons: (1) For 
some § € $, P,(S) = 0, or (2) >>| f(2) |pa(z) diverges. 

zeS 

Definition 3.6.3. Let g be any random variable, and let v be any 
point in the range of g. The sets Sẹ = {z:g(z) = v} form a partition 
Sz on the sample space determined by g. For a fixed w, the random 
variable E(f | Se), also written E(f | g), is called the conditional ex- 
pectation of f, given g. 


Let Z = (Z, 9, p) be a sample space, and let = be a class of random- 
ized strategies for nature defined on Q. We are interested in finding a 
function q such that, for each eZ, ge is a probability distribution on 
Z X Q which assigns the value £(w) to the event Z X {w} and the value 
P(e | w) to the conditional event {z} X 9, given that the event ZX {o} 
has occurred. More precisely, we are looking for a function q such that 


the sample space 
Z’=(Z4X2,2, 


has the property that for every z, w, and £ 
a) Q(Z X lol) = Eo) 
and for E(w) > 0 
Z Qde) X 2] Z X toh = pela) 
where for any set S in Z X Q 

Q(8) = L geo) 

(a)eS 
Consider any w for which (w) > 0. Then, since Qz satisfies (1) and 

(2), we have by (3.6.2) 
#) axle, o) = pel DE) 
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If (w) = 0, then (3) holds by (1) since, if a set has probability zero, 
every element in it has probability zero. Conversely, a function q satis- 
fying (3) also satisfies (1) and (2) as can be seen by direct computation. 

Let Z’ = Z X Q, and let g and h be two random variables defined on 
Z’ such that for 2’ = (z, w) e Z’ 


gz’) =z and h(z’) =w 


Then, by Definition 3.4.2, the function g is the joint probability dis- 
tribution of g and h, given &. 
It follows from (2) that if nature employs a mixed strategy £, the 
quantity p(z | œ) is then in fact the conditional probability of z, given w. 
Let S be any set in Z’ = Z X Q such that Q(S) > 0. Then for any 
vector-valued random variable f defined on Z’ the conditional expected 
value of f, given S [see Definition 3.6.1], is 


Develo) 
E(f | 8) = a in 


We consider two special cases for f and S which are of statistical 
importance. 


Case 1. f(z’) = 1 for z’ = (z, w); f(2’) = 0 otherwise; § = {z} XQ. 
Then | 

: — PE | o)l) 

RUIS -S paloa 


That is, E(f | S) represents in this case the conditional probability that 
nature’s randomized choice is w, given that the outco 
ment is z. This conditional probability is known a; 
probability of w, given z, and is designated by £,(w). 
&,(w) is also known 
definition: 


me of the experi- 
s the a posteriori 


(The expression for 
as Bayes’ formula.) Fi ormally we have the following 


Definition 3.6.4. Let Z = (Z, 9, p) be a sample space, and let = be 
a class of probability functions defined on Q. Then 
probability distribution of w, given z, is the conditional 
w, given z, i.e., the function £, 
and which is such that 


the a posteriori 
l distribution of 
which for a fixed z e Z is defined on £ X Q 


a) = PE| oa) 
mo 2 P(e | EO) 
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Case 2. f(z’) = L(w, d(z)) for some de D; and S = {z} XQ. Then 
È Llo, d@))p| o)l) 
E 8 = wed 
GIS Erel 
wel 


= © Le, d@))é-() 


That is, Z¢(f | S) is the conditional risk, given z. This quantity will 
be designated by 7:(d). The dependence of this function upon £ will be 
suppressed, since in any given discussion ¢ will be fixed. Formally we 
have the following definition. 


Definition 3.6.5. Let Z = (Z, 2, p) be a sample space, let A be an 
arbitrary space of actions, let D be a class of decision functions mapping 
Z into A, let L be a loss function defined on 2 X A, let = be a class of 
probability distributions defined on Q9, and let & be the a posteriori 
probability distribution of w for ¢ in Z and z in Z. Then the function 
Tz Which for a fixed z e Z is defined on Z X D and which is such that 


7-(d) = Z Llo, d(z))E(w) 


is called the conditional or a posteriori risk function given z and d. For 
ae A, 7,(a) denotes the number 7.(d) for the particular function d with 
d(z) =a. 

It will often be found convenient to consider an equivalent class [see 
Theorems 7.2.1 and 8.3.1] of randomized strategies for the statistician 
defined not on the space of functions D but rather on the sample space. 


Definition 3.6.6. Let Z = (Z, 2, p) be a sample space, let A be a 
Space of actions, and let @ be a class of functions ¢ defined on A X Z 
such that, for each ze Z, øz is a probability distribution on A. Then 
® is a space of randomized strategies for the statistician. 


To select a randomized strategy p from ® is to select the following 
tule of behavior: For every outcome z of a single experiment, the statisti- 
cian chooses action a with probability ¢.(a) = g(a | 2). The risk func- 
tion p is then defined as follows: 

Definition 3.6.7. Let Z = (Z, 2, p) be a sample space, let A be a 


Space of actions, let be a space of randomized strategies for the statis- 
tician, and let L be a loss function defined on 2 X A. Then the (ran- 
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domized) risk function is the function p defined on 2 X ®, such that 
plo, o) = DL, aE.(e(@| 2)) = E E Le, agla | pE | v) 
aeA aeAdzeZ 
where ¢ is in ®. 


3.7. Games Involving Densities 


In order to enlarge the scope of the applicability of the statistical 
theory to be developed, it will be necessary for illustrative purposes to 
discard the restriction that every w £ Q is zero except on a countable set 
in Z. For these purposes a sample space is defined as a triple X = 
(X, 9, p) where X is an n space and for each w €Q, py is a density de- 
fined on X (see Definition 2.6.2). A vector-valued function f defined 
on X is a random variable, and, if the integral 


f S(®)pa(x) dx 
x 


exists, it is called the expectation of f, given w, and is denoted by E,( Ds 
In particular, if f is the characteristic function of a set B in X, then 
L..(f) is called the probability of the set B, given w. For x = (ti, +++, 2p) 
any element of X, let fx be the characteristic function of the set 


B: = {7:7 = (yay +*+, Ya) E X, y1 S iy ++, Un S Tn} 
Then the function F, such that 


Fu(x) = Bu(fx) 


is called the cumulative distribution function (cdf) of the vector x, given w. 
Let @o consist of densities, and let d be a function mapping X into 
the space of actions A. The risk function p is defined by 


a) plw, d) = L L(w, d(x))palx) dx 


We conclude this section by definin, 


g the a posteriori probability dis- 
tribution & of w, 


given x, and the a posteriori risk function 7, for the 
case where & is a class of densities defined on Q. For this purpose we 
take Q to be a k-dimensional Cartesian space. Each element in Q de- 
termines a probability distribution or density on X. The function & is 
then defined by the equation 


i = p(x | @)£(@) 


o 


s- f PEL DEO dos, +++, do 
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and the a posteriori risk function 7, is defined by the equation 


rld) = f -f Lo, ADE) day, «=, don 


3.8. Preliminary Remarks Concerning Sequential Games 


The single-experiment game, though often encountered in statistical- 
decision theory and practice, is a very special type of game. By re- 
stricting the experimenter to a fixed sample-size experiment, the class 
of strategies open to him deals only with the type of action he is to take 
once he knows the outcome of the experiment. But clearly the conduct 
of the experiment itself can be made into a legitimate strategy for him. 
Thus, for example, instead of exhausting all N observations, the experi- 
menter might take observations in sequence and at each stage decide 
on the basis of the information thus far collected whether to stop ex- 
perimenting and select an element from A or to take another observa- 
tion. If the observations were costless, the enlargement of the class of 
Strategies by introducing sequential-sampling procedures of the above 
type would not alter his behavior, since he couldn’t lose and might 
actually gain by taking all V observations (assuming he is restricted to 
that many). However, in most real situations, observations are costly, 
and the experimenter might greatly improve his situation if at each 
stage of experimentation he balances the cost of taking future observa- 
tions against the expected gain in information from such observations. 

As an illustration of what might be gained by allowing sequential 
Procedures as part of the decision maker's strategy, consider the follow- 
ing example: An inspector at an Army Proving Ground has to decide 
whether to accept or reject a lot of rocket-propellent powder on the 
basis of the performance of 5 randomly selected rockets which he is to 
fire. A propellent is called defective if the pressure developed in the 
rocket chamber is 3000 Ib. or more per square inch. The acceptability 
of the lot depends on the proportion of defective items in the lot, and 
the decision is to be based on the number of defective rockets found in 
the sample of 5. (In practice, the actual pressures observed would pre- 
sumably form the basis for a decision. The problem as stated is some- 
what artificial but has been chosen as an illustration because of the 
Simplicity of the resulting space of outcomes X.) ; f 

For each rocket fired, let « be the random variable which has value 
l if the propellent is defective and 0 otherwise. Then, if all 5 rockets 
are fired, the space of outcomes X consists of the 32 points t1, ---, t32 


given on the next page. 
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Space of Outcomes for a Binomial Variable with 5 Observations 


zı = (0,0,0,0,0) zs =(0,0,0,1,0) zx = (0, 0, 0, 0, 1) zz = (0, 0,0, 1, 1) 
vz = (0, 1,0,0,0) zw = (0, 1,0, 1,0) 215 = (0, 1,0,0,1) 225 = (0, 1,0,1, 1) 
tz = (1,0,0,0,0) 211 = (1,0,0, 1,0) z= (1, 0,0,0,1) zə = (1, 0,0, 1, 1) 
= (1, 1,0,0,0) 212 = (1, 1,0,1,0) z% = (1, 1,0,0,1) zs = (1,1,0, 1 1) 
zs = (0,0, 1,0,0) 215 = (0,0,1,1,0) zz = (0,0, 1,0,1) a9 = (0, 0, 1, 1, 1) 
ze = (0, 1,1,0,0) zı = (0,1,1,1,0) zə = (0,1,1,0,1) zz = (0, 1,1, 1,1) 
zı = (1,0, 1,0,0) as = (1,0,1,1,0) zz =(1,0,1,0,1) z3 = (1,0, 1, 1, 1) 
zs = (1,1,1,0,0) aie =(1,1,1,1,0) za =(1,1,1,0,1) za = (1, 1,4, 1)-1) 


The space A of actions consists of only two points, a1, a2 where a; 
stands for the acceptance and az the rejection of the lot. 

It will be shown in Chapter 7 that, however the Government measures 
losses from a wrong decision, a class of “admissible” strategies [see 
Definition 5.4.9] if all 5 rockets are fired is given by the following: Let 
m stand for the number of coordinates that have value 1 in x, and let k 
be any integer from 0 to 4. Then accept the lot if m < k (i.e., define 
d(x) = a if m < k), and reject the lot if m > k (i.e., define d(x) = ay 
if m > k). 

Now it is clear that, since firing a rocket is costly, the Government 
might gain appreciably if the rockets were fired one at a time and the 
appropriate decision made as soon as the value of d became known. 
Thus, for example, if k = 0, after the first observation it becomes 
known that d(x,;) = ay for i = 3, 4, 7, 8, 11, 12, 15, 16, 19, 20, 28, 24, 
27, 28, 31, and 32; after the second observation it becomes known that 
d(x;) = a for i = 2, 6, 10, 14, 18, 22, 26, and 30, and so forth. In 
fact, only if the outcome of the experiment is xı or 247 will all 5 observa- 
tions be required to make a decision. Similarly, if k = 2, after 3 obser- 
vations it becomes known that d(xi) = as for i = 8, 16, 24, 32 and 
Ali) = a for i = 1, 9, 17, 25. Thus we see that, though the above 
class of decision procedures is admissible as a costless fixed sample-size 
experiment, it is definitely not admissible if the cost of the observations 
is taken into account. To find a class of admissible decision procedures 


in the latter case, we must enlarge the class of strategies by introducing 
sequential-sampling procedures, 


3.9. Space of Statistician’s Strategies in Truncated Sequential Games 


To define a statistical 
possible strategies for the statistician 


Sec. 3.9 STRATEGIES IN TRUNCATED SEQUENTIAL GAMES 91 


(1) The total number of possible subexperiments does not exceed a 
certain preassigned integer N. 

(2) The sequence in which these experiments are to be performed is 
predetermined and is not part of the statistician’s strategy. 

Assumption 1 states that we shall be dealing only with truncated 
sequential procedures. This restriction is necessary since the notion of 
a joint probability distribution has been defined only for a finite num- 
ber of random variables (see Definition 3.4.2). However, in Chapter 9 
this restriction will be removed for a large class of statistical-decision 
problems. Assumption 2 is of no consequence if the random variables 
arising from the choice of the subexperiments are independently and 
identically distributed for each w  Q [see Definition 3.11.3]. In all other 
cases the sequence in which the subexperiments are performed does 
matter and complicates the characterization of a general sequential 
procedure. 

Under assumptions 1 and 2 the sample space X = (X, Q, q) has the 
following structure: For each w in Q the function gwu is a probability 
distribution defined on X and each point ve X is an N-tuple x = 
(€i, £2, +++, xy) where each z; i = 1, +++, N, is the value of a random 
variable, but we allow the possibility of not observing all the coordi- 
nates of a point x but only the first j = 1, 2, +++, N of them before a 
decision is made. That is, we shall now consider rules that to each point 
x of X assign two numbers, an integer j = 1, 2, +++, N which specifies 
the number of coordinates of x to observe before terminating experimen- 
tation, and an element a of A which specifies what action is to be taken 
once experimentation is terminated. It is clear that, if a rule assigns 
the value (j, a) to a point x e X and if a point y e X has the same first 7 
coordinates as x, the rule must assign the same value (j, a) to y also. 
Thus a truncated sequential procedure is a decision function 5 with 
values 6(x) = (v(x), a(x)) which maps every point x = (ay, +++, ey) of 
X into the product space J X A with elements (j, a) where J stands for 
the set of integers (0, 1, +-+, M) and A is the set of terminal actions, 
and, if 8(c) = (jo, ao), then, for every y = (Yı, ++", yw) © X such that 
Yi = z; for 0 < i < jo, 6(y) = (Jo, Go) also. The reason zero is included 
in the set J is to permit taking an action without experimentation which 
may sometimes be desirable in case the cost of taking observations is 
high. Thus for any function ô such that 6(x) = (0, a) for any re X 
then 6(x) = (0, a) for all z e X. 

As an illustration of sequential sampling rules, consider the rocket- 
firing experiment with the sample space X discussed above. A possible 
Sequential rule is as follows: fire the rockets 1 at a time and stop as soon 
as 2 defectives are found; in any case stop when all 5 rockets have been 
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fired. Let n be the number of rockets fired by this rule. Suppose that 
the first problem is to decide whether to accept or reject the lot. A 
possible criterion is the following: accept the lot (i.e., take action a) if 
n > 4; reject the lot (i.e., take action ag) otherwise. The function ô 
then has the following values: 


(a1) = (5, a) A(x) = (5, a1) (217) = (5, a) 8(x25) = (5, a1) 
ôlz2) = (5, a) 8(x10) = (4, ar) 6(z1s) = (5, a) 5(a26) = (4, ar) 
ô(z3) = (5, a1) 6(au) = (4, a) (x19) = (5, a) 8(x27) = (4, a) 
ô(x4) = (2, a2) 6(x12) = (2, ae) 8(x29) = (2, az) (x23) = (2, a) 
(a5) = (5, a) (x13) = (4, ai) 5(x21) = (5, ay) ôlx29) = (4, ai) 
6(a5) = (3, ae) ôlz14) = (3, ae) (x22) = (3, ax) 8(x39) = (3, a2) 
8(x7) = (3, ae) (x15) = (3, a2) 8(x93) = (3, a2) 6(x31) = (3, a) 
5(xs) = (2, ae) 8(x15) = (2, a2) ôt) = (2, a) ôlz32) = (2, ae) 


Suppose now that the second problem is to estimate the proportion p 
of defective items in the lot. The space of actions A in this case is 
the closed interval (0, 1). A possible sequential estimation rule is 
the following: Let m be the number of defectives found; if n < 5, take 
(m — 1)/(n — 1) as an estimate of p, and if n = 5, take m/5 as an 
estimate. For this estimation procedure, the function 6 has the follow- 
ing values: 


6(x1) = (5, 0) (xg) = (5, 1/5) 6(x17) 


7) = (5, 1/5) (v9) , 2/5) 
(2) = (5, 1/5) (tw) = (4, 1/3) (ars) = (5, 2/5) (xz) , 1/8) 
&(x3) = (5, 1/5) ôlzu) = (4, 1/3) (x19) = (5, 2/5) êlor , 1/3) 


) 
ô(x4) = (2, 1) (x12) = (2, 1) 8(x20) = (2, 1) 8 (22s) 
8(x5) = (5, 1/5) (x13) = (4, 1/3) ô(x21) = (5, 2/5) ôl) 
) 
) 
) 


pel ed ed 
eS 
bo 
© 


(6) = (3, 1/2) (is) = (3, 1/2)  òlza) = (3, 1/2) Cry 
èla) = (3, 1/2) (nas) = (3, 1/2) (wag) = (3, 1/2) Cay 
5(as) = (2, 1) ô(z16) = (2, 1) ôlza) = (2, 1) Base 


id a dono 


Roman 


, 1/2) 


VPSSENEPRA 
= 
ya 
= 


It is clear from the above exam 


ples that every sequential-sampling 
plan and terminal-decision rule de 


fines a function 6 of the type con- 
sidered. It is not, however, immediately obvious that a sequential- 
decision function 6 is in fact a rule that tells the experimenter at each 
stage whether to take another observation or to stop experimenting 
and make the specified decision. To see that this is the case we give a 
second definition of a sequential procedure where such a rule is clearly 
implied and show that the two definitions are equivalent. 

Let J be the set of integers (0, 1, 2, ---, N), and let K be any subset 
of J. A set S in X is called a cylinder set over K if whenever x; = yi for 
allie K, i = 0; then z e S if and only if ye S. (Note that, in order to 
prove that a set is a cylinder set over K , it is sufficient to prove the “if” 
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part or the “only if” part since the definition is symmetric in x and y.) 
In terms of cylinder sets we define a sequential procedure as follows: 

A sequential procedure 6* consists of a partition $ of X with elements 
So, S1, ++ -Sy such that each S; is a cylinder set over K = {reJ :0 <r sjh 
together with a sequence of functions do, dı, ---, dy such that, for each 
j, d; is a function of the first d coordinates of x only and maps S; into 
A. We shall call the partition $ a sequential-sampling plan and the 
sequence of functions {d;} a terminal-decision rule. We remark that 
So is either all of X, in which case S; is empty for 7 > 0, or So is empty. 
Also the sequence of functions {d;} will generally depend on the par- 
tition $. This dependence will not however be exhibited in what fol- 
lows. 

We now show that the two definitions are equivalent. Assume that 
ô* consisting of a sequential-sampling plan $ and terminal-decision rule 
{d;} is given. To find a sequential function 6, we let v(x) be that j such 
that x eS; and define a(x) = d») (£). Clearly, the function ô which for 
every x e X has the value 


8(x) = (v(e), e(z)) 


is a uniquely defined sequential-decision rule. Conversely, let a sequen- 
tial-decision rule 6 with 5(x) = (x(x), a(z)) be given, we shall find a 
sampling plan $ and a decision rule {d;}. If ô is such that 6(~) = (0, a), 
we take for $ the set So = X and for {dj} the single function do with 
do(x) = a. Otherwise, we define 


Sy = {xz v(x) = k} 


and show that S; is a cylinder set. Suppose x € Sz, and let y be such that, 
for0 <i < k, x; = yi: Then, by the definition of ô, 6(a) = 4(y) so that 
v(x) = v(y) = k, and hence ye Sr also. To define a terminal-decision 
function dy for each Sp, we set d(x) = a(x). Then d(x) depends only 
on the first k coordinates of x since, if y: = ti for 0 < i < k, then, by 
the definition of S, and the condition on ô, 6(x) = 6(y) and a(x) = aly), 
and, consequently, d(x) = d(y). Thus, dą is a mapping of the re- 
quired type. 

Finally, simple considerations show that, if we start with a sequential- 
decision function 6 and apply the above construction to obtain a sequen- 
tial procedure 6* and from this 6* in turn obtain a sequential-decision 
tule 6’, then 6’ = ô. Conversely, if we start with a 6* and obtain a 6 
and from this ô a 6**, then 6* = 5**. 

In many statistical problems, the sequential-sampling plan $ is fixed. 
What is required is to find an optimal terminal-decision rule. For this 
and other reasons which will become clear in Chapter 9, it is con- 
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venient to remove the dependence of the terminal-decision rule {d;} 
upon the partition $. To this end we define a general class D of terminal- 
decision functions d, mapping the product space J X X into A such 
that, if d(j, x) = a and y is any other point in X such that Yi = x; for 
0 <i <j, then d(j, y) = aalso. Clearly, given any sequential-sampling 
plan $ and any d e D, then d determines a sequence of terminal-decision 
functions d; such that d; maps S;e$ into A. We now give a formal 
definition of a class of sequential-decision functions which will be em- 
ployed in this book. 

Definition 3.9.1. Let Z = (Z, 2, p) bea sample space, let fı, «++, fy 
be random variables defined on Z, let £ = (Z, 9, q) be the sample space 
such that X = X, X Xə X ++- X Xy where X; is the range of fi i 
= 1,2, ---, N, Qo is obtained from pg according to Definition 3.4.2. 
In addition, let A be an arbitrary space of actions, let J = {0,1,---,}, 
and let © be the class of partitions of X such that, if $ = (So, Si, +++, 
Sy) € ©, then each element S; of $ is a cylinder set over K = {reJ:0 
<r <j}. Moreover, let D be the class of functions d which map J X X 
into A and are such that, if z and y are in X, x; = y;for0 <i < j and 
d(j, x) = a, then d(j, y) = a also. Then the product space © X D is 
called the class of sequential-decision functions. 


We remark that the elements (S, d) of S X D are the pure strategies 
for the statistician in a sequential game. We also remark that formally 
speaking the reference to the sample space Z and the random variables 
fi, +++, fy could be omitted. These concepts were introduced here to 
make clearer the underlying structure of sequential procedures. It is 
to be remembered, however, that the f;’s are not necessarily identically 
distributed random variables, and the spaces X; may be different for 
each i. 

As an illustration of a partition $, consider the sequential procedure 


for the rocket-firing experiment given in the second table. Here $ = 
(S2, S3, Ss, Ss) where 


So = (4, xg, T12, X16, T20, T24, Log, T32} 

Ss = {x6, tz, T14, 245, T22, T23, T30, T31} 

Sa = (10, £11, £13, £26, X27, Log} 

Ss = (x1, T2, T3, 25, 29, T17, Tig, Lig, T21, Lop} 


The function d maps the sets Sq and S; into 1; Se and S3 into ap. 
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3.10. Definition of Truncated Sequential Games 


Having characterized the spaces of pure strategies for the two players, 
it remains only to specify the payoff or risk function in order to complete 
the definition of truncated sequential games. However, to define the 
risk function for games with sequential-sampling rules, it is necessary to 
take into account the cost of performing each subexperiment. This cost 
may depend only on the number of subexperiments performed, or, more 
generally, it may depend on the outcome of these experiments, such as, 
for example, in medical experiments on animals, where the cost of the 
observation depends on whether or not the animal survives. 


Definition 3.10.1. Let X = (X, 9, q) be a sample space, where X = 


Xı X- X Xy, let J = {0, 1, +++, N}, and let c be a non-negative 
function defined on J X X such that if x and y are in X and 2; = yY; 
i= 1,2, ---,j, then c(j, x) = c(j, y). Then c is called a cost function. 


Thus, for example, if the cost is simply proportional to the number of 
subexperiments performed, then c(j, x) = kj for all v for which the de- 
cision rule requires terminating sampling with j observations and where 
k is a positive constant. 

In what follows we shall write ¢;(x) for c(j, x). 

Definition 3.10.2. Let X = (X, 9, q) be a sample space, where X = 
X, X---X Xy, let A be a space of actions, let © X D be the class of 
sequential decision functions (for and A), let L be a loss function de- 
fined on Q X A, and let c be a cost function defined ond xX x : where 
J = {0, 1, ---, N}. Then the payoff or sequential risk function is the 
function p defined on 2 X GS X D such that 


plo, 8, d = F E lez) + LO, d, a))asle) 


j=0 re 5; 

As in the case of a single experiment, p(w, $, d) will exist for all deci- 
Sion functions (S, d) of the type considered. If X is an N space and the 
elements of Qo are densities, then the equivalent risk function is de- 


fined by 
N 
= tee x) + Llw, aj, *))Iqu(*) dar, +++, ae, 
p(w, $, d) -2f - fix ) + Le, a9 1 N 


We now give a general definition of a truncated sequential game. 


Definition 3.10.3. Let Z = (Z, 9, p) bea sample space, let fi, +++, fy 


be random variables defined on Z, let X = (X, 9, q) be the sample space 
such that X = X, X -> X Xw, where X; is the range of f;, for i = 1, 
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-++, N, and Qo is obtained from Fa according to Definition 3.4.2, let 
A be a space of actions, let G X D be the class of sequential-decision 
functions (for X and A), and let p be a sequential risk function defined 
on 2 XG XD. Then the game (Q, S x D, p) is called a truncated- 
sequential game. 


3.11. Some Further Theorems on Probability 


The following definitions and theorems will be used in the subsequent 
chapters of this book. 


Definition 3.11.1. Let Z = (Z, Q, p) be a sample space, let Sı, So, 
+++, Sn be subsets of Z, and let w be an arbitrary element of Q. The 
subsets (events) S1, S2, ---, S, are said to be independent with respect 
to pw if, for every set of subscripts 7), ia, «+ +, Ík, no two equal, 


PS, N Si N+- N Sa) = Po(Si,)Pu(Siz) «++ Pa(S;,) 
for k = 2, 3, +++, n; i; = 1, 2, n. 


Definition 3.11.2. Let È = (Z, 9, p) be a sample space, let $1, So, 
++, Sn be n partitions of Z, and let w be an arbitrary element of Q. 
The partitions Sı, So, - - *, Sn are said to be independent with respect to 
Pw if every collection of n sets Si, So, +++, Sn that can be formed by tak- 


ing one element from each of the n partitions is independent with re- 
spect to py. 


Definition 3.11.3. Let Z = (Z, 9, p) be a sample space, let fi, fe, 
‘++, fx be n random variables defined on Z, and let w be an arbitrary 
element of Q. The random variables fi, f2, +++, fn ave said to be inde- 
pendent with respect to Pu if the partitions $, Sp, +++, Sy determined 
by fi, fo, +- E respectively, are independent with respect to pu. 

Definition 3.11.4, The n events Si, So, 
S2, +++, Sa, or n random variables f,, fo, 


pendent with respect to ®g if they ar 
for every we Q. 


“++, Sn (or n partitions $1, 
"++, fn) are said to be inde- 
e independent with respect to Pu 


Theorem 3.11.1. The partitions $}, 82, ++, Sn are independent with 
respect to pa if and only if for every collection of n sets Si, So, +++, Sn 
with S; € S; 


(i) P(S NN Sn) = P..(S) -+ Po(Sa) 


Proof. It follows from Definition 3.11.2 thai 


a 3 t if the partitions are 
independent then (i) holds. Conversely, 


Suppose (i) holds, we must 
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show that the partitions are independent; i.e., for every arbitrary col- 


lection of n sets S1, ---, Sn with S; € S; 
a) P.(Si, N -e N Sa) = Palsi) +++ Pali) 
where i, i2, +++, 7 with 2 < k < n are any k arbitrary subscripts, no 


two of which are equal. We shall prove (1) by a backward induction 
onk, Fork =n 


(2) P.(S1 N+: Sp) = I] PoS) 

i=l 
by hypothesis. Assume that (1) holds for k =n — 1, n — 2, ++, 
m (m > 3). To show that it holds for k = m — 1, let t1, tg, +++, îm be 
m distinct subscripts. Then since 
(3) Sa AN Saa = U Bi 2-1 Sina N Sin) 

Sim £ Sim 

we have 
(4) PalSa NN Sini) = Po k U (Sa AN Sima N Si | 
so that, by the induction hypothesis, 

m-1 
(5) Palsa NN Sig) = E TT PalSs)PalSin) 

Sim € Sig B= 


m—1 
= jii PolSi) Da Pa(Sin) 
k=1 S, 


im © Sim 


But U S; = Z so that ZPo(Si„) = 1 and consequently 
m—1 


(6) Po(Si, NN Sin) = TT Pale) 


which proves the theorem. 
++, fn are independent with 


Corollary 1. The random variables f: 1 fas * 
respect to pa if and only if 
qo(01, V2, °°") Un) = Mo(01)G20(v2) da Qno(Un) 
where go is the joint probability distribution of fı, +++, fx and qiw is the 
distribution of fi 
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Definition 3.11.5. Let 3 be a partition of Z. A collection $ of sets of 
Z is called a subpartition of 5 if (a) $ is a partition of Z and (b) each 
S £ 8 is a subset of some T in 3. 


Theorem 3.11.2. Let S; (¢ = 1, 2, ---, n) be subpartitions of the n 
partitions 5; of Z. Then, if the collection (S1, S2, +*+, Sn) are independ- 
ent with respect to some Pa £ Pg, so is the collection 31, 32, +++, In. 


Before we prove this theorem, we require the following lemma. 


Lemma 3.11.1. If $ is a subpartition of 5, then any T e 3 is the union 
of sets in 8. 


Proof. Let Sr be the set of elements of $ which are subsets of T. We 
shall show that T is the union of these sets. Since, for every Se Sr, 
S C T, it follows that 


U scr 


SeSr 


Hence, it is sufficient to show that there exist no elements in T that are 


not in LU S. Assume the contrary, and let zeTNC( U 8). 
SeSr SeSr 
Then, since $ is a partition of Z, there exists some S* e $ such that z e S*. 


Now let T” be that element of 5 such that S* C T”. Hence, since ze T 
and ze S*, it follows that ze T N T’, and, since the sets in a partition 


are disjoint, T = T’. Thus 8* C T; i.e., S*eSp and z¢C( U S). 


SeSr 
This contradiction proves the lemma. 


We now return to the proof of the theorem. 
In order to prove that 5, +++, 5, are independent, it is sufficient, by 


Theorem 3.11.1, to show that for any arbitrary sets T;e 3; (i = 1, 2, 
oom n) 


PAT N-A Ta) = TT P(T) 
i=1 


Now, by Lemma 3.11.1, 


T= U S, Sr € Sp 
Ske Sr, 
so that 
Tien r= U = U Wn---ns,) 
Sie Sz, Sne Sp 


and since the partitions $; are independent 
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P(T N- A Tr) = DP pis > P.(S1) +++ Pu(Sn) 


Sie Sr Spe Sr, 


E PREJ L PRO) 


Sie S7; Sne Sr, 


P.( U si) --Pa( U sn) 


Sie ST, Sne Sty, 


l 


= P..(T1) +++ Pa(Pn) 


which proves the theorem. 


Theorem 3.11.3. Let fı, fo, ***, fn be n random variables which are 
independent with respect to some Pe € Po, and let 91, Jo, ***; Yn be n 
vector-valued functions. Then the random variables gı Ofi, --:, 
Jn O fn are independent with respect to Pw where g; O f: stands for the 
composition of the two functions g; and fi. 


Proof. The proof of this theorem follows from Theorem 3.11.2 since 
8, is a subpartition of S,,07- [See Lemma 8.2.1,] 


Theorem 3.11.4. For two independent numerical random variables 
f and g 
Eo(fg) = Eo(f)Eo(9) 
provided .,(f) and E.,(g) exist. 
Proof. Let rs be the joint probability distribution of f and g. Then 


Tav, w) = Pas (0) Gea) 


distribution of f and ge is the probability 


where p, is the probability an eee, 


distribution of g, where v is in the range 0 
Now 


Bu(f)Eu(g) = E vipato) E walw;) = Z vitesPalesd gals) 
= Ð vwe w) = Lu Z ros ws) 
a,j u 


viwj=u 


= uP. {e:f@g9@ = u) = E.(f9) 
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Theorem 3.11.5. Let Z = (Z, 2, p) be any sample space, and sup- 
pose that, for all z € Z, 


Panl) > Dalz) as n> o, 
fae) > JC) as n>, 
hO] <M  foralln 
Then E,,(fn) > Eu(f) asn > œ. 
Proof. Fix «> 0, and choose 21, +++, 2%, so that 


k 
E pul) >1l-—e 


i=1 
Now choose N so large that, for n > N andi = 1, ---,k, 


| Dun (2i) — Pw (Zi) | < e/k 
Then, forn > N, 


È pale) > È pale —e>1-2 
Thus, for n > N, 
(1) | Eola) — Balha) | < ÈI Pon(2i) — Poli) | | fa | 
F a, PaE + pol@)] | ful2) | 


< «M+ «M+ 2M = 4eM 
and, for all n, 


k 
(2) | Bu(fn) = Bo) | < E poel Sule) = Flea) | 
i=l 
k 
+ 2 PORO — $6) |< E pedl hed — Se) | + 2M 
i tty Bk i=1 


Taking the limit superior of both sides in (1) and (2), we obtain [see 
problem 3.11.7] 


(3) lim sup | Eon (fn) — Eu(fn) | < 4eM 
and 
(4) lim sup | Bu(fn) — E(f) | < 2M 


Since e is any positive number, (3), (4), and problem 3.11.7 imply 
Eoln) > Elf) 
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Corollary. Eu(fn) > E(f). 
Proof. Choose wn = w for all n, and the hypotheses of the theorem 
are satisfied. 
PROBLEMS 


Prove the following theorems on conditional expectations defined on a sample 


space Z = (Z, Q, p). 
3.111. Elf | (2:9 © Bi) = BulBa(s| 9) | 12:96) © BH 
3.11.2. Let $ be a partition of Z, and let E,(f) exist. Then 


EES | 9) = Bol). 
In particular, if Sẹ is the partition determined by the random variable g, then 
E{EoS| 9] = Eo) 
3.11.3. Let $ be a partition of Z. Then 
EuB |9 |S = E19 
3.11.4. Let 5 be a partition of Z, and let $ be a subpartition of 3. Then 
BAS |3) = BalBolf |9 3] 
In particular, if h O g stands for the composition of the tw 
Bog | hog) = EalBalf| 9) hog] 


o functions h and g 


3.11.5. For any convex function h 
Bah O f) = Balk O g) 


where g = E,(f | S) and S is a partition of Z. 
3.11.6. Let h = Eo(f|3). Show that Eo(f |h) = h 
3.11.7. Given a sequence of real numbers fan}, let 


lim sup @n = inf sup am 
48 k mak 


im infan = Pah 
(k, m = 1, 2,3, +++). Show that a necessary and sufficient condition that lim an =0 
is that lim sup | an | =0. 
3.11.8. Let 
In = Lpa) H > FO), 1 = Lee) 


If there is a g(x) such that 
ee | fala) | < g@) for all n, x 
= DY pag) converges 

then I, > Z asn > ». ó 


3.11.9. Show that the corollary of 
3.11.8. 


Theorem 3.11.5 is a special case of Problem 


CHAPTER 4 


Utility and Principles 
of Choice 


4.1. Introduction 


The theory of zero-sum two-person games developed in Chapters 1 
and 2 is intended as a theory of rational behavior in an unknown situa- 
tion, when the unknown factor is the strategy chosen by an intelligent 
opponent whose interests are diametrically opposed to one’s own. The 
statistical games discussed in Chapter 3, on the other hand, aim at an 
analysis of situations in which the unknown factor is the state of a pre- 
sumably neutral world. Games against an intelligent opponent and 
statistical games have the same formal structure, being characterized 
by a triple (X, Y, M). 

However, in stating that a game is a triple (X, Y, M), where M is a 
bounded numerical function defined on X X Y, we already assume the 
concept of numerical utility, since, for x in X and y in Y, M(a, y) is in- 
terpreted as the negative of the utility to player II (the decision maker) 
of the outcome resulting from the choice of strategy x by I and strategy 
y by II. Our task in this chapter thus naturally falls into two parts: 
to develop the theory of numerical utility and then to examine various 
principles for choosing a strategy, once the utilities of the various pos- 
sible outcomes are given. Before turning to the first part of this task, 
it will be useful to elaborate upon the description of the structure of a 
game given in Section 1.2, 

Every two-person game can be described by (i) two spaces F, and Fo 
of strategies for players I and II, respectively, (ii) an outcome space R, 
and (iii) the association with each pair f = (fy, fo) in F, X Fs of a 
probability distribution ry over R, Since the theory of utility is de- 


Chapters 1 and 2, in games against an intelligent opponent, probability 
distributions enter in the form of chance moves by the referee and also 
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through the use of mixed strategies by either player. In statistical 
games, F, is a parameter space, or index set for probability distributions 
on the sample space. Correspondingly Fə is the space D of decision 
functions mapping Z into a space A of possible actions by the decision 
maker. Each decision function d e D is a random variable defined on Z 
with values in A, and thus [see Definition 3.4.2] for each d e D there is a 
function pa defined on A X Q with values pa(a | w) such that, for each 
w EQ, pa is a probability distribution on A. (Alternatively, each de D 
generates a sample space (A, 2, pa).) The decision maker may also 
use a mixed strategy by selecting a d in D according to a probability 
distribution 7 over D or by randomizing over the sample space [see 
Definition 3.6.6]. A choice of an ne H also induces a probability dis- 
tribution on A for each w, namely 


qn (a | wo) = 2 10) a pe |o) 


which in terms of the function pa is given by 


mlalo) = E napala] w) 
de D 


Nowa specific action a by the decision maker results in a specific out- 
come r in R. More precisely, in any decision problem there is a fixed 
function g which maps A into R. This function is a random variable, 
and for a fixed 7 and w it induces a probability distribution on R. 

We sce then that in general games and statistical games what we are 
faced with in a decision problem is a choice of one of a set of probability 
distributions on R. An essential question we must answer in choosing a 
strategy is: What is the relative value (i.e., utility) of the various pos- 
Sible probability distributions over R? 

F k the ae of the problems central to this chapter we make a 
following concise formulation of the decision problem. _Let Q be the 
Space of strategies of the opponent. Each 7 in H determines a oe 
k, defined on Q and taking values in P, the set of probability istri u- 
tions on R. We may replace the space H by the space K oo 
mapping Q into P; the decision maker is then required to choose ex- 
actly one k from K. ; 

These aa are illustrated in the following example. Suppose 
that a coin is about to be tossed, and that a man must choose among 
dı, betting a dollar on heads; də, betting a dollar on tails; and dg, not 
betting. If w is completely unknown, @ consists of the unit interval 
9 <w <1, and, for each we, po = o is the probability with which 
the coin falls heads. The space H consists of all randomized mixtures 
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of the three elements dı, dz, and dz of D, so that an 7 in H is specified 
by a triple n = (Ay, àz, A3), With 4; > O and à; + Az + Ag = 1, where 
àz is the probability with which the man chooses d;. The outcome set 
R is (1, 0, —1), corresponding to winning, not betting, and losing a dol- 
lar, respectively. The class K consists of (employing the subscript i 
to represent the decision d;): 

(i) The function kı defined on Q with ki(w) = (w, 0, 1 — w), where 
(w, 0, 1 — w) is a probability distribution on R = (1, 0, —1). 

(ii) The function ke with ke(w) = (1 — w, 0, w). 

(iii) The function k with k3(w) = (0, 1, 0). 

(iv) All convex linear combinations of these three functions. 

Thus, in general, k,(w) is given by 


gi = wry + (1 — w) 
Go = ^3 
q-1 = ado + (1 — or 


4.2. Utility 


If w were known (or, more generally, if the mixed strategy of the op- 
ponent were known), the problem of the decision maker would be simply 
that of choosing one of a given set of probability distributions over R. 
But, whether w is known or not, the decision maker must be able to 
judge the relative value to him of any two probability distributions 
over R. In particular, we shall suppose that the decision maker has 
formulated his objectives with sufficient clarity to declare which of two 
outcomes he prefers (or that he is indifferent between them), and, more 
generally, for any two probability distributions p, and p2 over R, to 
declare which he prefers. Formally we have the following. 


Definition 4.2.1. Let R be an arbitrary set, and let P be the set of 
all probability distributions over R. Then a preference relation (or 
preference pattern) > on P is a binary relation such that 

(i) For every pı and pg in P, either P2 = Pi Or pı > po (both may 
hold), and 

Gi) If ps > pz and pz > pı, then pz > py. 


The relation p2 > p; is to be interpreted as meaning pz is preferred or 
indifferent to py. If po > pı but not pı > P2, we say that po is preferred 
to pi, written p2 > py. If po > pı and p; > P2, we say that po is in- 


different to pı, written po ~ pı. The following properties, summarized 
as a theorem, are easily verified. 
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Theorem 4.2.1. The relation ~ is an equivalence relation; i.e., it is 
reflexive, symmetric, and transitive. For every pi, p2, exactly one of 
P2 > Pı, P2 ~ Pi Pi > Poholds. If ps > pz and pe > pi, then ps > pı- 


In the development of the theory of games we were concerned with 
only those preference patterns that may be described by numerical 
utility functions, i.e., for which there exists a bounded function u de- 
fined on P such that 


(1) u (x wpa) = E AnU(Pn) 


©% 
for any sequence \, > 0 with D = 1, and 
1 


(2) u(p2) > u(pr) if and only if pz È Pi 

If such a function u exists, we define (r) = ulg), where qr is the 
particular probability distribution over R assigning probability one to 
the outcome r. Since for any pe P and any weR we have p(w) = 
> p(r)q-(w), it follows that 


p =D plr)g 


Thus, replacing \n by p(r) and Pn by qr in (1), we get 
(3) u(p) = E poula) = X pO) 


Equation 3 says that the utility of any p is the expectation Ep(h) of 
the function h with respect to p. Moreover (2) implies that p is pre- 
ferred or indifferent to g if and only if Ep(h) = E,(h): There is then a 
numerical function h defined on the space of outcomes, such that the 
aim of the decision maker is simply to maximize the expectation of h. 
Conditions under which a utility function exists are given in the next 
theorem. In what follows, we shall write pi < p2 for po = Pı- 
Theorem 4.2.2. Let R be an arbitrary set, and let P be the set of 
all probability distributions on R. Then a preference pattern > on P 
has a utility function if it satisfies 
Hy: If pin X Pen for all n, then 
(a) DrnPin < DAnPon 
for any sequence àn 2 0 with DA, = 1. If in addition pin < pon for 
some n for which An > 0, 


(b) DAnPin < ZnPon 
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(Note that, if for all n, pin ~ Pon, then (a) implies that Drnpin ~ 
DAnDon-) z 

Hs: If pı < po < pz, there are numbers à, u with O <u < 1 
such that 


Api + (1 — A)ps < P2, uP: + (1 —u)pa > po 


Before turning to the proof of the theorem we remark that for any 
sequence {pn} and any sequence {A,} of non-negative numbers with 
DAn = 1, the distribution p = 2A,p, which is an element of P selects 
rin R with probability 


P(r) = Znpr(r) 


Thus the distribution p may be achieved by selecting an integer n ac- 
cording to the distribution A, then selecting r according to pa. Hypothe- 
sis H, then requires that, if for each n the decision maker would be 
willing to have the distribution po, rather than pi, used in choosing 
r, then, if he is required to choose between an unspecified element of the 
sequence {pon} and the corresponding element of the sequence {Pin} 
for the selection of r, he will prefer that the choice be made from the 
sequence {p2,}, no matter what random mechanism is used to accom- 
plish this choice. His preference, moreover, will be definite if the ran- 
dom mechanism selects with positive probability at least one n for 
which there is a positive preference for pon against pin. Hypothesis Ha 
asserts that, if pı < p2 < ps, then, if an experiment is to be performed 
to decide whether pı or ps is to be used in selecting r and pı has proba- 
bility \ of being chosen, there is a à < 1 so near 1 that the decision 
maker will be controlled by his preference of pz over pı, and a \ > 0 
so small that his preference for p3 over pz becomes controlling. (A 
looser interpretation of Ho is that there exists no peP that is either 
infinitely desirable or infinitely undesirable.) 

The proof of the theorem depends on certain auxiliary propositions 
and runs as follows: 

A. For any pı and ps in P with pı < po, and any à and p with 0 < 
<4], 

up: + (1 — u)po < dpi + (1 — A)p2 


Proof. From H; and the hypothesis that pı < P2, we have 

Pi “Api + (1 — Api < Api + (1 — A)p2 < Apo + (1 — 2) po ~ p2 
and thus 
(1) Pi < Api + (1 — d) pe < p2 
For 0 < à <p» < 1 we have the identity 
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A 
(2) amt ( — pe = -ipt Aalt G = y wy 
u u 


and from (1) 
(3) up + (1 — H)pe < p2 
so that by H, and (2) 


A 
(4) Api + (L Np > „MP + (1 — z)pel 


A 
+ (1 = ~) [upi + (1 — u)po] ~ upı + (1 — u)p 
m 


which completes the proof of A. 
B. For any q with po < g < pı there is a unique à such that 
a~ Apo + (1 = )P1 
Proof. If either po~g or g~ pı then the result is immediate. 
Assume then that po < q < pı, and let 
T = {à with 0 <à < 1:g < Apo + (1 — N)p1) 
Clearly 74 Ø. Also if àe T then \ #1. We shall show that T is a 


half-open interval 0 < à < à*. By A we have that, if \»e T and 
Ao > Ay, then A, £ T. It remains to be shown that T does not contain 


a largest element. 
For any À e T, there is by Hz an a, 0 < æ < 1 such that 


E q < apo + (1 — a)Apo + (1 — \)pil 
7 q < [A+ all — Apo + (= a)(1 — d)p1 
so that \ + a(1 — à) e T. Consequently, if A* is the least upper bound 
of the interval T, then Aà*¥ g 7. Let now 

U = {x with 0 < à < 1: Apo + (1 — A)p1 < g} 
Then by a similar argument we can show that U is a half-open interval 
a* <A <1. Thusà* < p* and 

qg ~ po + (1 — Apr 

if and only if A* < à <u*. But by A there cannot be more than one 


à satisfying the above indifference relationship, and hence à* = p*. 
This completes the proof of B. 
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For fixed po and p, with po < pı we define, for all g with po < q 
< pi, the numerical function h as follows: h(q) is the unique number 
— t 
1 — A such tha ings H =A, 
Then from H, A, and B, we see that for any p andr with po < p < pi, 
Po<7T< Pi 
(i) h(p) > h(r) if and only if pèr 
Now from H and the definition of the function h we have for any se- 


quence un > 0 with Zun = 1 and for any sequence gn with po < dn 
< pı for all n, 


D nmn ~ |i =h (= min)| Po +h (= sn) pı 
f 1 1 


But we also have from this definition and H, 


È Hagan ~ È tenl(L — Algn)) Po + Algn)p:] 
1 1 


ey l: a po wha) Po + De Mnh(Gn)P1 
1 1 


Hence we conclude from the uniqueness property of h that 
ao W 

(ii) h (= st) = J. unh(qn) 
1 1 


Any numerical function defined on an “interval” Po S q < pı with 
properties (i) and (ii) above will be called a utility function on the in- 
terval (po, pı). We have proved: 

C. For any po and p; with Po < pı there exists a utility function on 
(Po, P1). 

Now fix a pair go, qı with qo < qı and consider any interval I = 
{p:po < p < pi} containing go and qı. It follows from C that there 
is a utility function g on I and it is easy to see that for any constants 
a and b with a > 0 the function g’ defined as follows for every p in I 


g'(p) = ag(p) + b 
is also a utility function on J. 
In particular, since g(go) < g(q1), we can choose a and b so that 


ag(qo) +b =0 
ag(qi) +b =1 


and 
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Then we define the utility function wy,, with values u(p | I, g) for every 
p in I by 
u(p| I, g) = ag(p) + b 

D. For any two intervals J; and I> (each containing go and qı), 
for any utility function gı on J, and any utility function gə on Is, and 
for every p in I; N Io 

u(p | Li, 91) = u(p| Io, 92) 
Proof. If p~ qo or p ~ q the result is immediate. For each of the 


remaining three cases: (i) p < go, (ii) qo < P < M1, (iii) qı < p, there 
exists numbers A; 0 < A; < 1,7 = 1, 2,3 such that 


G) qo ~ (1 — M)p + Mugi 
(ii) p ~ (1 — r2)go + 2M 
(iii) qı ~ (1 — A3)go + AsP 


From the additive property of the utility function uz, we have in the 
three cases 


(i) 0 =M + (1 — Mulo | Ts, g) 
(ii) ulp | Ti, gi) = 2 
(iii) 1 = gulp | Zi, gi) 


Hence, u(p | Z1, 91) = (p | I2, g2) for every p in I, N Io. 
For any p there is an interval Z containing p, qo, and un: Define 
u(p) as the common value of u(p| I, g) for every I containing P, qo, 
and qı, and every utility function g on I. Then u(p) is defined for all 
p, and u is a utility function for every such interval I. 
E. The function u is bounded. 


Proof. Suppose, for instance, oy 
can find a sequence {pn} such that u(pn) > 2 
for all n. Let 


ao N 2. 
q= D2 Pa qn = ib 2p) + 2-"py 
1 
1 


that wis unbounded above. Then we 
and u(pr) > u(Pr—1) 


Since N $ 
g= DP 27"Dn F by 27 "Dn 

1 N+1 

and N s 


gn = 2 "Pa + > 2-*pn 
1 


N+1 
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then, using Hı, we have q > qy for all N, so that u(q) > u(qy) for all 
N. By hypothesis u(p,) > 2” and thus u(g) > u(gy) > N + 1, for all 
N, which is impossible. 

The proof that u is bounded below is similar. 

F. For any sequence pn and any sequence An > 0 with Zn = 1, 


u (x Wapa) = 5 Ant(Dn) 


Proof. In case x is zero except for a finite number of values of n, 
the proof is obvious, since there exists an interval I which includes the 
finite set of probability distributions p,. 

Suppose An > 0 for an infinite number of values of n (so that 


D An #0 for all N). Since u is a utility function on any interval I 
N+1 


we have 


(Sa) 


N+1 


N ao o 
u [ E MPa + ( 5, w) marapa | 
1 n=N +1 


N o a 
DS Anul) + E rate ( 2 urapa) 
1 


N+ n=N +1 


ð 
where uyn =An/ D, An. Since u is bounded, as N — 00 
N+1 


an ao 
D ra ( > vapa) 0 


N+1 n=N +1 


which completes the proof of F and also of the theorem. 

We remark that the loss function L which was defined in the previous 
chapter [Definition 3.5.2] takes as values the negative of the utility to 
the statistician of probability distributions on the space of outcomes R. 
It is in terms of the loss function that the theory of utility enters in 
statistical games. 

As a simple illustration of the concept of utility we consider the coin- 
tossing decision problem, with the three possible gains 1, 0, —1. The 
class P is the set of all distributions P = (Pi, Po, p-1) with p; > 0, 
Zp: = 1. If we suppose that (1, 0, 0) > (0, 1, 0) > (0, 0, 1), there is a 
one-parameter family of possible utility functions. For, fixing u(0, 1, 0) 
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= 0, u(0, 0, 1) = —1, we may choose any b > 0, and set u(1, 0, 0) = b. 
Then u(pı, Po, p—1) = bpı — P—ı. The number b measures the willing- 
ness of the decision maker to take risks. If b = 1, his utility is simply 
his expected gain; if b < 1, his expected gain must be definitely posi- 
tive before he will risk a dollar, whereas, if b > 1, he will undertake 
certain risks with negative expected gain (though positive utility). 

It is obvious that the determination of the utility of the probability 
distribution on R is seldom as easy as in the above example. In many 
applications the utility of a given probability distribution is identified 
with the expected monetary gain connected with it. However, this is 
not always possible, as, for instance, in decisions involving scientific 
research or human life. Another course often followed is simply to take 
as the value of the loss function the probability of making a mistake. 
The difficulties involved in applying utility theory to statistical deci- 
sions are often appreciable, but fortunately many of the considerations 
about statistical games remain the same for a large number of different 
loss functions. 

PROBLEM 

4.2.1. In the coin-tossing decision problem, define m(p) = pı — P- o°(p) = 

pi? + pi? — [m(p). Which, if either, of Hi, H2 is violated for the following pref- 


erence patterns? 2 7 
(1) p > qif and only if m(p) > m(q) or m(p) = m(q) and o°(p) < o*(q). 


(2) p > qif and only if m(p) > max (m(q), 1/2). 
(3) p > qif and only if m(p) — eo(p) > m(q) — <o(9)- 
(4) p > qif and only if pı > gı- 


4.3. Principles of Choice 


In terms of the utility function of the decision maker, the decision 
problem assumes a simple form. The problem may be represented as 
one of choosing a strategy in a game (Q, F, M) where X is the space of 
strategies of the opponent (in statistical games, a class of parameters 
determining probability distributions on a sample space), F is a uni- 
formly bounded collection of numerical functions defined on Q such 
that, if eF, i= 1, 2, 47s then, for A; > 0, 2A; = 1, Difi e F, and 
finally M(w, f) = —f(e), where f(w) is the utility to the decision maker 
of f when his opponent uses strategy w. 

Our coin-tossing decision problem again furnishes us an example. If 
ulpy, Po, p—1) = bP1 — Ps b > 0, the functions available to the de- 
cision maker are all convex linear combinations of fı = bw — (1 — w), 
fo = —w + D(1 — a), and fg = 0; i.e., all functions f = aw + B(1 — w), 


where (a, 8) is in the convex hull T of (b, —1), (—1, b) and (0, 0). 
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For b < 1, for instance, T is shaded in Figure 4. A decision is the choice 
of a point (a, 6)e T. For a given (a, 8), player IP's utility when the 
coin falls heads is a, and when the coin falls tails is 8; the over-all 
utility of (æ, 8) is aw + (1 — w), where w is the (more or less unknown) 
probability that the coin falls heads. 

An equivalent alternative statement of the decision problem is that 
we are given a bounded numerical function u defined on a product 
space 2 X D; the decision maker must choose exactly one de D; and 


Figure 4 


the utility of decision d in state w is u(w, d). The latter formulation 
exhibits most clearly the relation among games against an intelligent 
opponent, statistical games, and the general-decision problem. In the 
first case, w is chosen by an opponent whose utility is the negative of the 
decision maker’s; the statistical game is the case in which w is regarded 
as being determined by nature. In the general-decision problem w may 
for instance be partly chosen by nature and partly by various individ- 
uals whose utilities may be in any relationship to those of the decision 
maker. 

Having completed our description of the decision problem, we turn 
to, but do not answer, the question of how to solve it. An answer would 
take the form of a principle of choice, i.e., a rule specifying, for each set 
F, which f e F should be selected. Now in the case where w is chosen 
by an opponent whose utility is the negative of our own, i.e., the case 
of (zero-sum) games against an intelligent opponent, we have seen that 
the minimax principle furnishes a reasonable theory. 
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Minimax Principle. Choose d so as to maximize inf u(w, d). 


In our coin-tossing decision problem, the minimax principle tells us 
to choose the point (a, 8) e T whose minimum coordinate is maximized. 


b-1 
This point is (v, v), where v = max (0, 2). Thus, for b < 1, 


v= 0, and (0, 0) is attained only by not betting, for b=1,v=0, 
and any 7 = (Ài, àg, Ag) with à = àz is satisfactory, and, for b > 1, 
v = (b — 1)/2, and only n = (1/2, 1/2, 0) is satisfactory. [See Problem 
2.4.21] 

The minimax principle is distinctly less satisfactory in statistical 
games and in some situations prescribes courses of action that would 
be regarded as unreasonable by all but the most incurable pessimists. 

Suppose, for instance, that a manufacturer has a process for produc- 
ing fuses, producing defective fuses with a constant but unknown prob- 
ability w, a fuse being regarded as defective if it fails to blow out at 
20 amperes. He has produced a lot of 10,000 fuses and can either junk 
the lot (action 1) or sell the fuses at 10 cents each, with a double-your- 
money-back guarantee for each fuse that proves defective (action 2). 
His income, which we consider as his utility, if he junks the lot is zero 
and if he sells the lot is 1000(1 — 2). a 

Thus u(w, A) = 1000(1 — 2w), where à denotes the probability with 
which the manufacturer accepts the lot. We have min ulw, A) = 


—1000X, and the minimax principle requires the manufacturer to junk 
the lot if he regards all w, or even w > 1/2, as remotely possible. This 
conclusion is simply the reflection of the fact that, if o were chosen by 
a hostile opponent, w would be chosen as large as possible. Notice that 
introduction of the possibility of sampling does not change the minimax 
conclusion at all: If the manufacturer tests N fuses, and decides on the 
basis of the results whether to junk or sell the remaining 10,000 — N, 
a non-randomized decision function is a subset S of the integers 0, -+, 
N, specifying those integers $ for which he proposes to junk the lot if 
s defectives are found in the sample. We have 
N\ oy — aN 

ulw, 8) = (1 — 2w)(1000 — 0.1N) zae =a) 
and 

Ulo, n) = È Sulo, 8) 

5 


where 7 is any probability distribution over the class of subsets S of 
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J* = (0,1, +--+, N). Since for 1/2 < w < 1, uw, S) < 0 unless S = J*, 
we have 
Uw, n) <0 for 1/2<w<1 unless 7(J*) = 1 


and u(w, J*) = 0 for all w. Again the minimax procedure requires the 
manufacturer to junk the remainder of the lot, even if no defectives 
are found in a sample of size N if, before the inspection, he regarded 
any value of w between 1/2 and 1 as remotely possible. Thus the mini- 
max principle involves ignoring the results of the sample entirely, and 
is unsatisfactory to anyone who believes that the results of a sample 
are relevant in the situation described. 

One modification of the minimax principle is the minimaz loss or re- 
gret principle: 

Minimax Loss. Choose d so as to maximize 


v(d) = inf [u(w, d) — sup ulw, d)] 


The maximum attainable utility in circumstance w is sup u(w, d), or, in 
a 


other words, it is the least risk attainable, so that to maximize the func- 
tion v is to minimize maximum regret or loss. Let y(w) = sup u(w, d). 
d 


In the example described above, 

v(a) = (1 — 2w) (1000 — 0.1N) for w< 1/2 
and y(w) =0 for w > 1/2 
The minimax principle is applied to 


ulw, d) = u(w, d) — y() 
that is, for w < 1/2, 


14 (@, 8) = (2w — 1)(1000 — 0.1N) [1 = =) wh = an=] 


= (2 — 1)(1000 — 01N) © (2 wt(1 — a4 

and, for w > 1/2, u = u. ai 

For N odd, say N = 2k + 1, the set S* = (k + l, +++, 2k + 1), cor- 
responding to junking the lot if and only if more than half the sample 
items are defective, is easily seen from the following considerations to 
be a minimax loss strategy, i.e., a minimax strategy for player II in the 
game with payoff 

pilo, S) = ~u (w, 8) 
We note that 
pilo, S*) = p(l — w, S*) 
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Let w be a point in 0 < w < 1/2 with 


pi (#1, S*) = max pi(w, S*) = v 
If £ is the mixed strategy for I mixing w, and 1 — w; equally 


pi, S) = = (1 — 2w,)(1000 — 0.1N) (= ” h(s, 9) 


s=0 
where 
h(s, 8) = o° (1 — w1) for ses 


h(s, S) = (1 — o) for s¢S 
We obtain min p;(é, S) by minimizing A(s, S) for each s, i.e.; seS if 
S 


and only if : a 
wll — o) < (1 o or 


ie; 


ors > N/2. Thus 


min pı(, 8) = p1 (ë, S*) = v = max pi (o, S*) 
s w 


and S* is the minimax loss strategy- 

Some statistical games have y(w) = 0 for all w, $0 that, for these 
games, the minimax and minimax loss principles coincide. This occurs, 
for instance, when w is an unknown parameter that the statistician 
must estimate from a sample, and all values of w have equal intrinsic 
desirability, so that the risk depends only on the error in estimation. 
In any case, most of the statistical games for which the minimax strate- 
gies have been investigated are games with y(w) = 0. ea 

An objection often raised to the minimax and minimax loss principle, 
as applied to statistical games, is that they do not take into account 
any information the decision maker may have about w, except to the 
extent that this information rules out certain w’s as being absolutely 
impossible. If he can describe his information about w by a probability 
distribution ¢ over 2, so that E(w) represents the probability, based on 
his information, that w is the true state of the system, then the utility 


of action d is simply 


UG, d) = È w)ulo, d) 


116 UTILITY AND PRINCIPLES OF CHOICE Ch. 4 


Thus he maximizes his utility by choosing d so as to maximize U (¢, d). 
This principle of choice, based on £, is called a Bayes principle; and a d 
which maximizes U(é, d) is called a Bayes solution (of the decision prob- 
lem) against £ Bayes principles have the objection that, in most sta- 
tistical games, £ is simply an expression of the personal judgment of the 
decision maker, so that two decision makers, facing the same decision 
problem and using Bayes principles, might well reach different conclu- 
sions from the same data collected in the course of the game, if their 
a priori judgments £, before the data were collected, differed markedly. 
A further objection to Bayes principles is that they never require ran- 
domization; many statisticians consider that random sampling, which 
is a form of randomization, is useful. [See Section 8.7.] 

Bayes solutions will be studied in considerable detail in later chap- 
ters. We remark here that, in the fuse-manufacturing example dis- 
cussed above, every Bayes principle leads to the choice of an integer m, 
with S = {s:s > m}. 

In general, a principle of choice associates with each decision problem 
(Q, D, p) a preference ordering > on D, with decision d, being preferred 
or indifferent to d if and only if dı > dọ. The minimax principle leads 
to the ordering dı > de if and only if 


sup p(w, d1) < sup p(w, de) 


while the Bayes principle with respect to an a priori distribution & 
leads to dı > d if and only if 


ZE(w)p(w, di) < TE)p(w, de) 


One way to arrive at “reasonable” principles of choice is to specify 
various properties which “reasonable” principles should have, and then 
determine the class of all principles with these properties. As an in- 
stance of this method, we present the following discussion. We shall 


consider Q as fixed, and impose requirements on the class of preference 
orderings as D varies. We first consider 


Lı: There exists a preference orderin 
functions D such that if di > dy 
and də then dı > də for all D. 

The principle L; asserts that the relative desirability of dı and də 
does not depend on the other alternatives available to the decision 
maker. Notice that Bayes and minimax principles satisfy Lı. How- 
ever, minimax regret does not ; as the following example illustrates. 


g > on the space of decision 
for some subset of D containing dı 
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Consider a finite game with matrix 


=. 

8 

w 
S] 
© 


W = 


bo 
© 
ow 
or 
8 


The “regret” matrix obtained from W is W itself so that minimax re- 
gret strategy for player II is to select də with probability one and the 
others with zero probability. If, however, we now delete column 1 
and convert the resulting matrix to a regret matrix, we obtain a new 
game with matrix 


in which the minimax regret strategy for player II is to select da with 
Probability one and the others with probability zero. Thus, which of 
the two strategies də or d3 is selected depends on whether or not d; is 
present. ; 

A principle satisfying Lı determines a preference ordering > on the 
set F of all bounded functions f on Q, and we shall formulate our next 
requirement: in terms of this ordering. It is 

Lo: If f(w) > g(w) for all w, then f > g. i 

The meaning and reasonableness of Le are obvious, and Le, appropri- 
ately formulated, is satisfied by all principles of choice that have been 
Proposed. _ Sse 

A requirement that is not satisfied by the minimax principle is 

L3: If fy > fo, then fı +g > fo + g for all g. 

This principle asserts that, in deciding between fı and fo, the deci- 
Sion maker is guided only by the change in utility, as a function of the 
unknown state w, resulting from a change from fı to fo. 
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Theorem 4.3.1. If Q is finite and > satisfies Lı, Le, and Lg, then 
either all f’s are indifferent or there is a probability distribution on 
Q such that fı > fe whenever 


Etoile) > YX Eloo) 


Proof. If Q consists of n points, w1, «++, wn, then there is a 1-1 corre- 
spondence between numerical functions f on 2 and vectors x in n space 
with 

x = [f(r), +++, Flon) 


Thus > may be considered as a preference relation among vectors. 
Denote by P the set of all vectors p = (pi, +++, Pn) with p; > 0 for 
i = 1,--+-,m. Then from Le we conclude that for all p in P 


p> e= (0,0, ---, 0) 
If there exists a po in P such that po ~ e, then from Ls we see that 
2Po ~ Po + Po ~ e + Po “Po ~e 
and 
—Po ~ & — A — Pow & 
and thus Npo ~ e for every positive or negative integer N. Since the 
components of po are all positive, for every n-dimensional vector x there 


is an integer N, with Nipp — x in P and there is an integer No with 
x — Nopo in P. And, since Nipo — x > e and x — Nopo > e, we have 


Nipo => x and x > Nopo 


But Nipo ~ e ~ Nopo, so that x ~ e for all x. Thus, unless all vectors 
are indifferent, p > e for all pin P. We restrict attention hereafter to 
the case p > e for all p in P. In this case, we have also q < e for 
—qeP, since —q > e implies —q + q > e + q. 

Let S be the set of all vectors s > e, and consider the set Q of all 
vectors representable as s + p, where se S and pe P. Taking s = e, 
we see that P C Q. Also p > e, and therefore by La 


s+p>s+e>e 


so that Q C S. Thus PCQCS. Moreover, Q is an open set since 
P + s is open and 


Q= U P +s) 


We show that Q is convex; since Q is closed under addition, it is suffi- 
cient to show that if xe Q and à > 0 then Axe Q. Suppose that x = 
So + Po with So in S and po in P. Then (M/N)so €85 for all positive in- 
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tegers M and N, since (M/N)so < e would imply Msp < e, while 
So > e implies Msp > e. If \>0 and peP, then clearly àp eP. 
Thus for à > 0, 

A(So + Po) = kso + Pi 


where k is a positive rational sufficiently near à and pyeP. Since 
kso £ S, we have Axe Q. Thus Q is an open convex set, and e is a bound- 
ary point of Q. 

By corollary 1 to Theorem 2.2.1, there is then a supporting hyper- 
plane to Q through e; i.e., there is a vector E = e with -x > 0 for 
xeQ. Each (i) > 0, for, if there were a Eli) < 0, then, since P C Q, 
there would exist an x in Q with component 2; relatively large enough 
to have £-x < 0. Since £(i) > 0 and § # e, we may normalize to ob- 


tain > (i) = 1. Furthermore, §-x > 0 for all x in S. [See problem 


1 
4.3.4.] Thus, if x > y,x — y €S and È- (x — y) >0. Hence§-x > §-y. 
Consequently, whenever §-y > §-x we have y > X. 

The meaning of the theorem is this: Anyone who uses a principle of 
choice satisfying Lı, Lo, and Ls is acting as if he regarded the states 
1, +++, n as having probabilities (1), +*+, (n) of occurring, at least 
insofar as the probabilities (1), ++, t(n) are decisive in differentiating 
between two actions. Our theorem does not guarantee -x = -y im- 
plies x ~y, and a simple two-dimensional example satisfying Lı, Le, 
and L3 where this fails is as follows: 

(@1, y1) > (xe, ya) if either zı > %2, OF Tı = T2 and yi > ye 
The £ vector is (1, 0), so that the decision maker regards state 2 as 
having probability zero. Nevertheless, between two actions yielding 
the same income in state 1, he prefers that yielding the larger income in 
State 2. 

The above theorem for the special case when @ is finite as well as the 
discussion of the fuse-manufacturing example indicates rather clearly 
that, without assuming an a priori distribution on 9, there is in terms 
of the present theory no adequate principle of choice for choosing a 
Strategy in a statistical game. This is one reason, though by no means 
the most important one, why Bayes principles of choice will be studied 
in detail in succeeding chapters. It is also the reason why we turn to 
the study of various classes of strategies in the next chapter, the de- 
velopment of which is useful in studying arbitrary games as well as 


Statistical games. 
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PROBLEMS 


4.3.1. Consider the hypotheses 

La: fi > foimplies \f1 + (1 — A)g > Afe + (1 — A)g for all g e F and all à in (0, 1). 

Ls: If fn > g for all n and fale) — f*(w) for all œ, then f* > g. Assume that 2 is 
finite and that the preference relation > on all the vectors satisfy Lı and Le. 

(1) Show that L4 implies Lg. 

(2) Show that L3 and Ls imply Ly. 

(3) Show that L4 and Ls guarantee that the equality 


E Eio) = D Eoo) 


implies fı ~ fo. 

4.3.2. For any numerical function h(t) satisfying h(t, + t2) = A(t) + h(t), the or- 
dering > defined for two-dimensional vectors by x = (zı, 22) > y = (yı, yə) if (a) 
zı +22 > y1 + yo or (b) z1 + 22 = yı + y2 and A(zı) > h(y) satisfies Lı, Le, and 
Lz. 
4.3.3. It is known that there are discontinuous functions h satisfying h(t, + t2) 
= h(t) + A(t). This provides an example showing that L3 does not imply Ly. 
Employ the solution h(t) = ¢ to prove that L4 does not imply Ls. 

4.3.4. A set R in n space is said to be a cone if whenever xe R, Axe R for à > 0. 
Thus the set Q of Theorem 4.3.1 is a cone. Is the associated set S the closure of this 
cone? Explain. 


CHAPTER 5 


Classes of Optimal Strategies 


5.1. Introduction 


In the previous chapter we considered some general principles of 
choice of optimal strategies in games where one of the players is not 
faced with an intelligent opponent but rather with an unknown state 
of nature. It was pointed out that no single principle thus far advanced 
seems to be compelling enough to insure a universal agreement on a 
rule for selecting a particular course of action among many that might 
be available. However, while disagreement might exist on what to 
do in a given situation, it might be possible to get full agreement on 
what not to do. To illustrate this point, let us assume that we are 
considering a game G = (9, D, p) where the possible states of nature 
are represented by points w in 9, the possible decisions of player II 
by points d in D, and the risk to player II by p(w, d). Suppose further 
that a strategy n for player II is proposed as a possible candidate for 
Consideration. Two cases may arise: (a) No other strategy can be 
found that is better than n. Technically, this means that there exists 
no n* such that p(w, 7*) < e(w, n) for all w with inequality holding for 
Some w. (b) There does exist a strategy n* that is better than n. In 
Case a the strategy ņ can rightly be designated as admissible but not 
necessarily preferred since other admissible strategies will be competing 
for attention. In case b, however, 7 could clearly be dropped from 
Consideration in favor of 7*. 
_ The above discussion suggests that in games played against nature 
it would be highly desirable to construct a class @ of strategies such 
that no reasonable principle of choice would lead one to select a strategy 
that was outside of this class. To satisfy this requirement the class € 
should possess the property that for every strategy 7 for player II that 
1s not in @ there can be found a strategy n* in @ which is better, i.e., 
Which is as good for all possible states of nature and actually better 
or some states. A class @ of strategies with this property is called a 


Complete class. 
121 
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It is easy to see that a complete class will necessarily contain all 
admissible strategies. It is not true, however, that a complete class 
can always be restricted to only such strategies. The reason is that, 
if a strategy 7 is not admissible, it does not follow that any of the 7*’s 
that are better than it will necessarily belong to the admissible class. 
If the class of admissible strategies is complete, it is the smallest com- 
plete class, and, if a minimal complete class exists, it is identical with 
the admissible class of strategies. It will be shown in this and later 
chapters that the admissible class is complete in many games of sta- 
tistical importance. It is obvious that for any game a complete class 
of strategies exists, namely, the class of all strategies; on the other 
hand, there are games in which the class of admissible strategies is 
empty. 

The following example will serve to illustrate some of the concepts 
mentioned above: A purchaser of an oil-bearing seed is engaged in 
crushing seed for its yield in oil. The oil yield in pounds per bushel of 
seed is the value x of a random variable which is assumed to be nor- 
mally distributed with mean @ and a known standard deviation co. 
The price of a pound of oil is r and of a bushel of seed is q. If other 
costs incidental to the crushing operation are ignored, the profitability 
of accepting and utilizing a lot of N bushels of seed depends only on the 
expected oil yield per bushel, i.e., expected profit = N(rd — q). Clearly 
it would pay to reject the lot whenever N(7@ — q) < 0 and accept the 
lot whenever N(r@ — q) > 0, whereas acceptance and rejection are 
equally profitable when N(r@ — q) = 0. In other words, it is profitable 
to accept the lot when @ > bo = q/r and to reject the lot when 0 < bo. 
If lots with “quality” @ > ðo are rejected, then the cost of this error 
will be assumed to be L1(6) = N (rð — q), and, if lots with quality 
0 < Oo are accepted, then the cost of this error will be assumed to be 
L2(0) = N(q — rð). 

Suppose the purchaser is supplied with a sample of n bushels of 
seed upon which to make his decision. Then a pure strategy for him 
would be a rule that would specify for every possible sample point 
X = (1, T2, +++, Xn) whether to accept or reject the lot. A mixed strat- 
egy for the purchaser can quite generally be described {see Theorem 
7.2.1] as a function ¢ with 0 < v(x) < 1 such that if x = (21, +++, @n) 
is observed he will accept the lot with probability (x) and reject the lot 
with probability 1 — g(x). In terms of these mixed strategies the situ- 
ation may be described as a game T = (O, ®, p) where © is the real 
line, @ is the class of integrable functions p, 0 < v(x) < 1 defined on 
n space and 


Sec. 5.1 INTRODUCTION 123 


(0, o) = ( se) n0 ii s fo = vg AE Sa 


for 0 > o, and 


00, ¢) = (ano fl feo BOO Ps 


for 0 < bo. 

It would appear that, if the purchaser knew nothing more about the 
situation than what was described, he would have to consider every 
element of P as a possible candidate for a final choice of a rule of ac- 
tion. Fortunately, this is not the case. It will be shown in Section 
7.4 that only a small subclass &* of g’s defined by g(x) = 1 if > k 
and g(x) = 0 otherwise, where = = Xx;/n and k any real number, are 
worthy of his attention, and no other strategy need be considered. 
The reason for this is that the subclass #* is a complete class, and in 
fact every element of &* is an admissible strategy. 

In addition to the admissible class there exist other classes of strate- 


gies which may or may not be complete but which from the point of 


view of decision making are of interest and will be discussed in this 
pters 1 and 2 will be 


chapter. The theory of games developed in Cha 
used as the tool for defining and characterizing all the optimal classes 
we shall be concerned with. Thus the same theory that served to delineate 
optimal strategies in games played against an intelligent opponent will 
serve to delineate classes of optimal strategies in games played against 
nature. Since the definitions of these classes of optimal strategies de- 
Pend in no way on special assumptions about statistical games, the for- 
mal definitions are stated in terms of arbitrary games. That these con- 
cepts may be found useful in general games is illustrated by the fol- 
lowing example: Consider the two-person zero-sum game G = (X, Y, 
M) with X = Y = (1, «++, 100), M@, 4) = min (x, y). Any strategy 
for Tis a good strategy; II has the unique good strategy y = 1. How- 
ever, it is clear that the only admissible choice for I is x = 100. Thus, 
even if the minimax principle is employed, the problem of optimal 
Classes of strategies becomes important in case the minimax strategy 


1S not unique. 


PROBLEMS 


5.1.1. In the illustrative example given above, let N = 1000, n = 36, e = 10, 
T = 15 cents, q = $3.00. Let ¢* bea strategy for the purchaser defined as follows: 
Accept the lot if # > 20 lb., reject the lot if Z < 20 Ib. Consider another strategy 
¥ defined as follows: Let m be the number of bushels of seed in the sample whose 
oil yield is greater than 20 lb. Then accept the lot if m > 18, reject the lot if 
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m < 18, and if m = 18 toss a coin and accept the lot if the outcome is “head,” reject 
the lot if the outcome is “tail.” 

(a) Plot p(0, »*) and p(6, 4) as functions of 9, and thus show that y is not admissible. 

(b) Show that ¢* is a minimax strategy, assuming that 6* is an admissible com- 
plete class. 

5.1.2. Give an example of a game in which the class of admissible strategies is 
empty. 

5.1.3. Player I chooses a number 9, 0 < 0 <1. A coin whose probability of fall- 
ing heads is @ is tossed N times, and player II bets $1.00 on the result of each toss, 
each bet made with knowledge of the results of previous tosses. Show that the mini- 
max strategy for II of betting on heads and tails with equal probability at each toss 
is not admissible and that a better minimax strategy is: For every n, let £n, yn de- 
note the winnings on those previous tosses when II has bet on heads, tails respec- 
tively. (£n and yn may be negative.) At the nth toss, II bets on heads if z, > y 
tails if £n < yn, and chooses heads and tails with equal probability if tn = yn. 


ny 


5.2. Complete Classes of Strategies in S Games 


In many important decision problems, we are confronted with a game 
in which nature has only a finite number of possible states. In such 
cases it is possible to obtain a full description of the structure of the 
complete classes under quite general conditions. 

It was shown in Section 2.4 that a game G in which player I has a 
finite number n of pure strategies is equivalent to an S game, where S 
is a bounded subset of the n-dimensional Cartesian space Sn. A pure 
strategy for player II is a choice of a point s = (s1, S2, +++, Sn) in S, 
for player I a choice of an integer i = 1, 2, +++, n, and the payoff to 
player I is the number s; Let S* be the mixed extension of S. Then 
S* is a convex subset of Sn, and each point in S* represents a possible 
mixed strategy for player II. It follows from the boundedness of S 
that S* is bounded. From Definition 2.4.1 we also see that the space 


= of mixed strategies for player I is a closed, bounded convex subset 
of Sn; i.e., 2 = the set of all points 


E= EO, ++), Oo Yew a1 
i=1 

In Section 5.4 we shall define variou 
trary games, and thus, a fortiori, for S 
informally state for these games the defi 
sible strategies and the class @ of Bayes strategies along with informal 
definitions of several other classes. These definitions refer to an S 
game (In, S, M) and its mixed extension (=, S*, M), where, for £ in Z 
and s in S*, MẸ, s) = £-s = D Esi (Strictly speaking, (=, S*, M) 


i=1 


s classes of strategies for arbi- 
games, but for convenience we 
nitions of the class @ of admis- 
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is only equivalent to the mixed extension of (In, S, M), but no confu- 
sion will result from this identification.) 
(1) @ = the set of admissible points in S*; i.e., a point a = (a, 
+, an) in S* belongs to G if there exists no s in S* such that s; < a; 
for all ¿ with inequality holding for at least one i. 
(2) @ = the set of Bayes points in S*; i.e., a point b = (bi, ---, bn) 
in S* belongs to @ if there is a Ẹ in Z such that 


-b = min (&-s) 
se S* 


(The point b is then said to be Bayes against Ẹ.) 


(3) =, = the set of all ë in = with £(¢) > e for all i. 

(4) Z4 = the set of all § in Z with (i) > 0 for all 7. 

(5) D, = the set of all s in S* which are Bayes against some § in Ze 
Because of its greater importance we give a formal definition of the 
next class of strategies to be introduced. 


Definition 5.2.1. Let (=, S*, M) be the mixed extension of an S game, 
and let =, be the set of all Ẹ in = with (i) > 0 for all i. Then D = 
the set of all s in S* which are Bayes against some § in Z4}. We denote 
the closure of D by D. 

We turn now to a number of theorems about the above classes of 
Strategies in S games. It is clearly seen that the defining properties of 
the classes Q, ®, D, and D remain unchanged if to each element of S 
we add a constant vector r, or, equivalently, if we translate the set S 
in Sn. 

Theorem 5.2.1. Let (=, S*, M) be the mixed extension of an S game, 
let s* = (s*,, ---, s*,) be a fixed point in S*, and let Tx be the set of 
Points in n space defined by t € Ts» if t; < s*; for all i. Then a neces- 
sary and sufficient condition that s* e @ is that Ts» N S* is empty. 


Proof. Suppose T N S* # Ø; say Sı€ Taf S*. Then, since 
S1 £ Ta», we have sı; < s*; for all z, and hence, for all ë in =, 
-sı < §-s* 
which contradicts the hypothesis that s* is Bayes, i.e., that, for some & 


mE, 
€-s* = min (§-s) 
se St 


This proves the necessity of the condition. To prove the sufficiency 
of the condition, we assume that Ts» N S* = Ø. Since T+ is open and 
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s* is a boundary point of T+, then by corollary 2 to Theorem 2.2.1 there 
exists a hyperplane §-x = c such that Ẹ-s* = c, -t < c for all te T 
and -s > c for allseS*. Let 6; be the unit vector with 1 in the ith 
coordinate. Then s* — §;e 7, and 


€-(s* — 8;) < c = E-s* 
so that §-5; > 0 and thus é(¢) > 0 for all i. Hence, we can assume, 
without loss of generality, that J> (i) = 1 so that e=. Clearly s* 
i=l 


is Bayes against this §, which completes the proof of the theorem. 


Figure 5 


We observe that in any S game the class of good strategies for player 
II form a subset of ®. Thus Theorem 5.2.1 simply carries further the 
geometric ideas introduced in Section 2.4 following the proof of Theo- 
rem 2.4.2, Here again we point out that in a game G = (X, Y, M) in 
which X is finite, the requirement that M is bounded can be replaced 
by the weaker requirement that M is bounded from below, and the 
geometric characterization of the class @ given by the above Theorem 
remains valid. 


Theorem 5.2.2. Let (=, S*, M) be the mixed extension of an S game, 
where S* is closed. Then the class ® of Bayes strategies in S* is closed. 


Proof. Let {bm} be a sequence of points in ® such that bn —> b 
as m — œ. We wish to show that be@®. Since b,e@ then bm is 
Bayes against some §” eZ; i.e., Ẹ™ -bp < E-s for all seS*. Since 
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= is closed we can choose a subsequence of {$°} which converges to 
§*e=. Letting m — © through this subsequence yields &*-b < &*-s 
for allseS*. Thus b is Bayes against §*, and hence be @. 


Theorem 5.2.3. Let (Z, S*, M) be the mixed extension of an S game, 
where S* is closed. Then, 


DCeCDCB 


Proof. Clearly D C D, and DC &. Since @ is closed by Theorem 
5.2.2, DC @. It remains therefore to show that (a) D C @ and (b) 
Qc. 

Proof of a. Let de®. Then there is a E = (EC), «++, &(m)) with 
EG) > 0 for all d such that -d < §-s for all se S*. Hence there can- 
not be an s e S* with s; < d; for all 7 and actual inequality for some 7, 
as this would imply that £-s < -d for that s. Thus de @. 

Proof of b. Letae@. To simplify the proof it will be assumed with- 
out loss of generality that a = e where e = (0, +++, 0). Consider the 
two sets S* and =, for e > 0. Since both of these sets are closed, con- 
vex, and bounded, then by Theorem 2.4.3 there exists a point s“ © S* 
and a point £© eZ, such that 


(1) g.s < £0.50 < E.s 


for all s e S* and Ẹ £ Ze We shall show that we can choose a sequence 
€m — 0 such that the corresponding sequence S m) > e. 
Since by assumption e e S* we have by (1) 


(2) £0 .s© < £O-e =0 


0 such that &&” — &* and sm — s*, 
Now, by (1) and (2), Es < 0 for all § E Be Therefore, since fon any 
£E}, È e Ze, for sufficiently large m, it follows that, for such m, §-s 
< 0 for all £=} and hence £-s* < 0 for all PeR Now, for any 
€Z, there is a sequence {Ẹ™}, E™ eZ, such that § — $. Since 
§™).s* <Q for all m, we have £-s* < 0 for all bez. In particular, 
8:-s* = s*, < 0 for each ¿ where 6; is the vector which has unity for the 
ith coordinate and zero for all other coordinates. But, by assumption, 
e is admissible. Hence s*; = 0 for all i; i.e., s* = an Tie we have ex- 
ibited a sequence s“™ — e such that, a each "m s‘“” is Bayes against 

a£ e, so that e e D and consequently @ C 2. 
We shall now give two examples which will show that (1) D # Q 


and (2) @ # D. 


Choose a sequence em 
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Example 1. Let S* consist of all points (x, y) such that (x — 1)? + 
(y—1)? <1. Then it is easy to see that all admissible points lie on 
the are of the circle (x — 1)? + (y—1)? = 1 with O<2, y<1. 
Thus the points (0, 1) and (1, 0) are admissible, but they are not Bayes 
against any Ẹ = (&(1), &(2)), &(1) + £2) = 1 with (ù) > 0. Thus 
DÆ. 

Example 2. Let T = the set of triples (x, y, 1) with 0 <2, y < 1 
and (x — 1)? + (y — 1} = 1, and let ao = (1, 0, 0). Let S* be the 
convex set determined by T and ap. Consider the point to = (xo, yo, 1) 
eT with 0 <2 < 1. Then there are positive numbers £1, 2, with 
tı + £2 = 1 such that 


a) tfi + Yt > Toti + Yoke 


for all t = (x, y, 1) e T with t # tọ. (The ¢;’s are given by the tangent 
line to the circle (x — 1)? + (y — 1)? = 1 at the point (xo, yo).) In 
particular, setting x = 1, y = 0, we get 6 > toti + yor. Now, pick 
e > 0 so small that e 


ç 
(2) fi > toti + yota + —— 
= 2 
and consider the vector Ẹ = ((1 — ef, (1 — e)f2, €)€ Z4. Then, by 
(1) and (2), a simple calculation gives us the inequalities 
68) E-t > E-to 
for all t e T, and 


(4) -ao > E-to 
Taking convex linear combinations of (3) and (4) yields 
(5) Es > §-to 


for all s €S*; i.e., to is Bayes against È; consequently toe D. Now let 
to > 1, Yo — 0. Then (zo, yo, 1) — (1, 0, 1) and (1,0,1)eD. But 
(1, 0, 1) is not admissible since (1, 0, 0) is better. Thus @ # D. 


Theorem 5.2.4. Let (=, S*, M) be the mixed extension of an S game, 
where S* is closed. Then the sets Q, D, and @ in S* are complete. 


Proof. If a subclass of a class of strategies is complete, then the 
class itself is complete. Thus, in view of Theorem 5.2.3, it is sufficient 
to show that @ is complete.. That is, we have to show that for every 
seS*, s¢@ we can find an ae@ such that a; < s; for all ¿ with in- 
equality holding for at least one 7. 
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Figure 6 


Figure 7 
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Let soe S*, so¢@, and let Sı be the set of all points se S* with 
Si S Soi forall i. Then S; is closed. Let ap bea point in S* which mini- 


n 
1 , A 
mizes f(x) = >> 7 tion Sı. Then, since S; is closed, ao € S1. Further- 
i=1 


more ap is admissible since (1) by Theorem 5.2.3, ao is admissible in 
S; as it is Bayes against a Ẹ with positive components: viz., È = (1/n, 
1/n, «++, 1/n); (2) no point of S* outside of Sı can dominate ap for, if 
Sı # S1, we have si; > so; for some 7 while Qoi Ê so; for all i. Finally, 
for at least one 7, ag; < so; since otherwise So would also be admissible 
which is contrary to our assumptions. 

Thus far we have assumed that nature has a finite number of pos- 
sible states, but no limitations were set on the possible actions (pure 
strategies) the statistician might have. If we assume that the statis- 
tician too can have only a finite number, say m, of pure strategies, 
then the game S consists of a finite number of points in S,, and S* is 
the convex hull of these points. That is, if Z1, --+, Zm are the points 
in S then every point s e S* is given by 


m 


s= $ nie, +, ¥ wien) = DY ali)z; 
j=: j=1 j=l 


where (j) > 0 for all j and > (i) = 1. For this special case we have 
j=1 
the following. 


Theorem 5.2.5. Let (=, S*, M) be the mixed extension of an S game, 
where S* is spanned by a finite number of points, say 21, +++, Zm. Then 
there is a finite number of Ẹ’s in Ey, say, --., £ such that D = 


Q = D = the set of all s in S* which are Bayes against some °, for 
i=l, k. 


Proof. Before we prove this theorem, we prove the following lemma. 


Lemma 5.2.1. If seS* is Bayes against a $ and s = > n(j)z;, then 
j=1 


every z; for which 7(j) > 0 is also Bayes against £. 


Proof. Let s be Bayes against È. Then E-z; > -s for all j, and 
hence (j)(€-z;) > n(j)(E-s) for all J. But the inequality sign cannot 
hold for any j for which n@) > 0 since we would then have 


m 


Es = a0) E-z) > EEs) =§s 
j=1 


j=1 
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which is impossible. Thus, for all 7 for which (7) > 0, we have 
n(g)(&-z;) = n(y)(E-s), and hence §-z; = §-s, which proves the lemma. 
We now return to the proof of the theorem. 

k 


We shall show that D = [LJ 7; where T; is closed and the elements of 


i=l 
T; are Bayes against a common $ eZ,. From this we will have D 
closed; hence D = D; and, since D C @ CD, the conclusion will follow. 
Let vı, «++, Vam be the 2” subsets of the set of integers {1, +++, m}. 
Let T; be the convex hull of the set of those z;’s for which jev; Let 
U; be the set of all s e D for which s can be represented by s = De n(j)2; 
Jeu 
where 7(j) > 0 for jeva 2, a(g) = 1. Since some U; may be empty, 
yen 
let us assume for simplicity that U; is not empty for i < k and is empty 
fori > k (k < 2”). 
Now, for i < k, let soe U; and Sp be Bayes against E% e Z4. We 
can write so = >> no(J)zj where mo(j) > 0 for j evi and >> m0(j) = 1. 


jevi P jeri 
Then, by Lemma 5.2.1, z; is Bayes against £ for each jevi For, 
se Ti, we can write sı = 2, m(j)Zj m(j) = 0, pa m(j) = 1. Then, 
jen jevi 
for each s e S*, 
E.s; = pe m()E® -z3 < > m(jE+s = £M.5 
jen jen 
Hence, each element of T; is Bayes against £ =, and this for i = 1, 
2, +++, k. Clearly, from its definition T; is closed. 
k 


T;. To show the opposite inclusion, sup- 


me 
The above shows D > U 
i=l 


Pose So £ D, So = 5 no(j)zj, and suppose the set of j’s such that 79(7) > 0 


j=l 


k 
isv,. Then sọ £ U, C T, Hence, D C > 7; and the proof is complete. 
i=1 
k k 
It might be noted that we have shown D = U T= U Ui 
hapter 2, but, since it is a 


The following result properly belongs to C ] 
we state it as a corollary. 


direct, consequence of the previous theorem, 

Corollary. Let (Z, H, M) be the mixed extension of a finite game 
G = (X, Y, M) with matrix II azl i= 1,001 %I = l, «+s, m. Then 
there is a good strategy £ = EU), <*> E(n)) for player I such that 
EC) > 0 for every pure strategy Ti in X which is Bayes against all good 


Strategies of player II. 
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Proof. Let R be the set of all i’s for which z; is Bayes against every 
good strategy of player II. Then, if I is restricted to 7’s in R, the value 
of the game is unchanged, since every good strategy £ for I will have 
Eli) = 0 for ig R. 


m 
If 7 is any good strategy in the new game, then >> a;(j) = v for 
aa 
m 
ie R, for suppose > anma) <v for some ij¢R. We can choose a 
j=1 


good strategy ņ* in the original game with >> aim*(j) < v for all i g R. 


j=1 


For every e > 0, the strategy N = (1 — e)n + en satisfies D aimi) 
j=l 
< v for ie R, and the inequality will hold for i £ R for sufficiently small 


e. Then, for sufficiently small e > 0, m will be a good strategy in the 


original game, and, since D anali) < v, io is not in R. Thus we may 
j=1 


suppose that every zx; i = 1, +++, n is Bayes against every good strategy 


7 for II, and must prove that there is a good strategy € with (i) > 0 
for all i. 


Let 21, +++, Zm be the column vectors of || az; ||, and let S* be the 
convex set determined by Zi, ***,Zm. Then G is equivalent to the S 
game (In, S, M), where § = {Z1, +++, Zm}, and (E, H, M) is equivalent 
to (Z, S*, M’), where, for Ẹ in Z and s in S*, M'(E, s) = ë-s. The point 
s* = (v, +++, v) is in S* and is admissible, where v is the value of G. 
By Theorem 5.2.5, there is a eZ} such that s* is Bayes against E; 


ie., §-s* < £-s for all s in S*. That is, -s > v for all s in S*, which 
completes the proof, 


PROBLEMS 


5.2.1. Let x = (2, z2) be any point in 2-space, and let K = {x: (xı — 8)? + 


(z2 — 8)? < 100}; Q = {x0 Sa, z2 < 4]; R = [x:2 < t1 < 3, x =0); P= 

{x: 21 = 0, 2 = 2}. Consider an § game with S = K A Q-(P U R). (a) Draw 

the set S, and describe the sets D, Q, and @. (b) Show how 

the fact that, if S* is not closed, there need not exist a minir 
5.2.2. Let G be a finite game with matrix 


this example illustrates 
mal complete class. 


a=(7 3152 it 2 aw s 
12 3 948 9 3 12 510 


and let S* be the mixed extension of the cor: 
set S*, and compute the strategies &), £2) 
5.2.3. Let S* be the mixed extension of a 


responding S game in 2-space. Draw the 
»7*+, €® guaranteed by Theorem 5.2.5. 
nS game in n Space, and let be @. Then 
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(1) b= Ð a0)s® where r <n; s = (9, ---, sn”) and sP eS for all j; 
j=1 
r 
aG) > 0 for all j, X n(j) = 1. 
j=1 
(2) If € is an element of = against which b is Bayes, then s” is also Bayes against 
the same $ for all j. 
5.3. The Class of Games G, 


Definition 5.3.1. Let G = (X, Y, M) be a game and (Z, H, M) its 
mixed extension, and, for every 7 in H, z in X, and y in Y, let 


M(x, y) = M(x, y) — M(, n) 


Then we call the game G, = (X, Y, M,) a game derived from G (or 
simply a derived game). 

For simplicity of notation in dealing with derived games, we write 
v*, and \*, for the upper and lower values of T}, the mixed extension of 
G,. Elements of H are denoted by n, #, and v. 


Definition 5.3.2. Let G = (X, Y, M) be a game, and let 7, « be any 
two strategies in H, and let G,, G, be the corresponding derived games. 
The strategy u will be said to be at least as good as 7 if T,(u) < 0; it 
will be said to be better than 7 if in addition T,(n) > 0. The two 
strategies u and 7 will be called equivalent, written u = n, if T,(«) < 0 
and T,(n) < 0; i.e., M(x, u) = M(@, n) for all x. 

Note that for a strategy u to be called better than ņ, there must exist 
at least one x = zo such that M(xo, u) < (ao, n). 


_ Theorem 5.3.1. Let G = (X, Y, M) be a game. Then in every de- 
rived game G, = (X, Y, M), v*, <0. 
Proof. By Definitions 5.3.1 and 1.6.2, T,(n) = 0. Hence 
vt, = inf T,(u) <0 
peH 


An immediate consequence of this theorem is that all admissible strat- 


egies in the original game G can be found only among the class of 7’s for 
which v*, = 0. To see this, suppose v*, < 0. Then there exists a strat- 


egy u such that T,(u) < —8 < 0; i.e., sup [M (x, x) — M(x, m] < —<0 


€ . 
or, what is equivalent, M (e, u) < M(E, n) — è for all x, so that 7 is not 
admissible. This remark is made as a partial motivation for studying 
the class of games G,. As will be seen below, many classes of optimal 
strategies will be defined in terms of the derived games G, and the 
Properties of these classes will depend on the properties of these games. 
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Theorem 5.3.2. Let G = (X, Y, M) be a game. A necessary and 
sufficient condition that any derived game G, = (X, Y, M,) have a 
value and the value be zero is that \*, = 0. 


Proof. Assume that \*, = 0. By Theorem 1.6.1, A*, < v*, so that, 
by Theorem 5.3.1, we have 


(1) At, =v, = 0 
Conversely, if the value of G, is zero, then (1) holds. 
Theorem 5.3.3. Let G, = (X, Y, M,) and G, = (X, Y, M,) be any 
two games derived from G = (X, Y, M). Then 
Tylu) + v*, > v*, 


Proof. Let n, p, and v be any three elements of H. Then, since for 
any functions f and g 


sup f(z) + sup g(x) = sup [f(x) + g(x)] 
we have 


(1) sup [MZ (x, x) — M(x, n)] + sup [M(a, v) — M(x, u)] 

> sup [M (x, v) — M(x, n)] 
and (1) is equivalent to * 
(2) Talu) + Tpl) > T,(r) 


Taking the infimum with respect to » on both sides of (2) yields the 
theorem. 


-Corollary. Under the conditions of the theorem 


Talu) + Tum) > 0 


Proof. Letv = yin equation 2 above. 


Theorem 5.3.4. Let (=, H, M) be the mixed extension of the game 
G = (X, Y, M), and suppose that there exists an n in H such that u 
in H is a minimax strategy for player II in the derived game G,. Then, 
in the derived game G,, vt, = 0: 


Proof. Since y is minimax in G,, we have, by Definition 1.8.5, 


T,(u) = ut 


so that, by Theorem 5.3.3, v* 


120. But, by Theorem 5.3.1, v*, < 0. 
Therefore, u*, = 0. 
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PROBLEMS 


5.3.1. Give a geometric interpretation of the game G, in case G is an S game. 
6.3.2. Give a geometric interpretation of all the theorems proved in this section 


in case G is an S game. 


5.4. Definition of Classes of Optimal Strategies 


In this section we formally define various classes of optimal strategies. 
In order to avoid repetition in the definitions, we shall omit statement 
of the condition that we are dealing with an arbitrary game G = (X, 
Y, M), its mixed extension T = (Z, H, M), and the games G, derived 
from G. The same rule shall be followed in the statement of theorems 
and definitions in the remainder of this chapter. 


Definition 5.4.1. Suppose e > 0. The class Me is defined as the 
class of all strategies » in H for which there exists an 7 in H such that 


T(z) <v*%, + € 


The class M, consists of all strategies for player II that are within «e 
of being minimax in some game Gy. 

Definition 5.4.2. The class M is defined as the class of all strategies 
u in H for which there exists an 7 in H such that 

Tylu) = vër 

The class M consists of all strategies for player II that are minimax in 
Some game Gy. 

Definition 5.4.3. The class Ji is defined as the class of all strategies 
n in H that are minimax for player II in G. 

Definition 5.4.4, Let e be any positive number. The class ®, of 


«Bayes strategies is defined as the class of all strategies 7 in H for which 


> e 
An e-Bayes strategy is an 7 which is within ¢ of being optimal against 
Some strategy & of player I in the original game G; that is, there exists 


a & such that 
M(t, n) < inf Mlo u) + € 
B 


This inequality follows from the definition since 


A*, = sup inf M,(é, u) > —e 
d B 
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implies that there exists a & such that 

inf M,e u) > —e 

P 


or 
inf M (to u) — M(t 1) > —e 


Definition 5.4.5. A strategy 7 in H is called extended Bayes if \*, = 0. 
The class of extended Bayes strategies is designated by @o. 


A strategy 7 is extended Bayes if and only if it is e-Bayes for every 
e>0. In view of Theorem 5.3.2, the class @o can be characterized as 
the class of »’s for which G, has the value zero. 


Definition 5.4.6. A strategy 7 in H is called Bayes if it is extended 
Bayes and there exists a £ in Z with A,(f) = 0. The class of Bayes 
strategies is designated by @. 


A strategy n is Bayes if there exists a & against w. 


hich it is optimal 
in the original game G; i.e., 


MG, n) = min MG, p) 


This follows from the definition and Theorem 5.3.2. 


Definition 5.4.7. For e> 0 a 


strategy 7 in H is called «admissible 
if v*, > —e. The class of e 


admissible strategies is designated by Qe 


An e-admissible strategy n has the property that there exists no other 
strategy u such that for all x ` 


(1) M(x, u) < M(x, n) — e 
This conclusion follows from the definition since 
u*, = inf sup Mè; u) > e 
implies that ae 
sup [M (x, u) — M(x, )] > —e 


for all y, so that for every u there exists an x such that 


M(x, u) — M(x, n) > —e 
Hence, there can exist no y Satisfying (1) for all x. 


Definition 5.4.8. A strategy 7 in H js called extended admissible if 
v*, = 0. The class of extended admissible strategies is designated by 
Qo. 
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A strategy 7 is extended admissible if and only if it is admissible 
for every e > 0. 


Definition 5.4.9. A strategy 7 in H is called admissible if, for all u 
in H, T,(«) > 0 unless u =n. The class of admissible strategies is 


designated by @. 


Definition 5.4.10. A class @ of strategies is called essentially complete 
if for every » e H there exists an 7 € € such that T,(y) < 0. 


An essentially complete class of strategies has the property that for 
any strategy in H there is one in the class that is at least as good. 


Definition 5.4.11. A class © of strategies is called complete if for 
every u ¢ @ there exists an 7 £ C such that T,(n) < 0 and T,(u) > 0. 


A complete class of strategies has the property that for every strategy 
outside the class there exists one in the class which is better. 

Classes @, ® and complete classes have been considered in the previous 
Section in connection with S games. To illustrate the remaining classes, 
Consider the game G = (X, Y, M) where X consists of all positive inte- 
gers, Y consists of all sequences {a;} = (a1, a2, +++) with a; > 0 for 
all j, and, for any x =neX and y = {aj} £ Y, M(x, y) = an. For 
any «> 0, the class Me = Qe = ® consists of the class of sequences 


{a;} such that, for at least one n, an < e The class M = Qo = Bo con- 


Sists of the class of all sequences faj} which have zero as a limit point, 
i.e., which contain a subsequence (aj, aja +++) such that lim a;, = 0. 


n= o 

As a simple example of an essentially complete class, consider an S 
game in n space in which the set S is convex. Then the class of pure 
Strategies (i.e., all the elements of S) form an essentially complete class 


in the class of all mixed strategies. 


PROBLEMS 


5.4.1. Prove that a strategy n is extended Bayes if and only if it is Bayes for 


very « > 0. Ae ie 
5.4.2. Prove that a strategy 7 is extended admissible if and only if it is -admissible 


for every e > 0. 


5.4.3. Prove that if n €@, then v*, — A*, < © , on , 
5.4.4. A class Q of strategies is said to be closed under equivalence if it satisfies the 


Condition that if n e @ then all strategies u such that » = 7 also belong to ®. Show 


that @ and Qo are closed under equivalence. i . 
5.4.5. A class Q of strategies which is closed under equivalence is complete if and 


Only if it is essentially complete. 
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5.5. Set-Theoretic Relations among the Classes of Strategies 


For any arbitrary game, certain set-theoretic relations always hold 
among the classes of strategies defined in Section 5.4. These are sum- 
marized in 


Theorem 5.5.1. In every game 


M. = Qe De 
U U U 
M = Qo DBo 
U U U 
N Q B 


Proof. (a) Me = Qe Let peMe Then Tylu) <v*, + e Using 
Theorem 5.3.3, we obtain v*, > —e, and hence pe Qe. Conversely, let 
we. Then v*, > —e so that O <v*, + €. Choosing n = y yields 
Tylu) = 0 < v*, + e and hence pe Me 

(b) QD Ge Always \*, < v*,. Hence, whenever \*, > =a > 
=. 

(c) M = Qo. Let uem. Then, by Theorem 5.3.4, v*, = 0, so that 
eGo. Conversely, let ue Qo. Choosing n = p yields Talu) = 0 = v*, 
and hence pe M. 

(d) @ D Bo. By Theorem 5.3.1, v*, <0. If 7 € @o then 0 = \* 
<v*, <0. Therefore v*, = 0 and 7 £ Qo. 

(ec) MIR. Let yew. By Theorem 5.3.1, v*, <0. Suppose 
ngm. Then, by (c), n ¢ Qo and hence v*, <0. Thus there exists a H 
with T,(u) = ô <0. That is 


M(e, u) < M(g, n) +6 


n 


for all x. Since 5 < 0, 7 cannot be minimax which contradicts the as- 
sumption that 7 € N. 
(f) @oD@. Let neg. Then by definition T,(u) > 0 for all p. 
Hence, v*, = inf T,(u) > 0. But, by Theorem 5.3.1, v*, < 0 so that 
H 


u*, = 0 and 7€ @o. 

The proofs of the remaining relationships follow directly from the 
definitions and will be omitted. Further, it can be shown by means of 
counter examples that each set-theoretic relation not stated in Theorem 
5.5.1 is in general false. 


Theorem 5.5.2. In every game G, @, = G: for every e> 0 if and 
only if for every 7 in H the game G, has a value. 
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Proof. If every game G, has a value then for every n, A*, = v*, so 
that the two classes are identical. Conversely, if Qe = G. for all e we 
must show that G, has a value for all n. Assume the contrary. Then 
there exists an y for which \*, < v*,. Choose e > 0 so that A*, < -e 
<v*,. Then (1) n£ @. since v*, > —e and (2) 1 ¢@, since A < e. 
Thus we have exhibited an e with @, = ®, which contradicts the as- 
sumption. 


Theorem 5.5.3. In every game G, Qo = Gp if and only if for each 
7 E€ Qo the game G, has a value (namely zero). 

Proof. If each game G, has a value for 7 £ Qo then by Theorem 5.3.2 
A*, = v*, = 0 and hence Qo C @o. But by Theorem 5.4.1 Qo D Bo 
and consequently Qo = @o. Conversely, assume Qo = @o. Then G, 
has a value zero for every 7 € Qo by Theorem 5.3.2. 


Theorem 5.5.4. In every game G, ® = @ if and only if for each 
7 € @o player I has a maximin strategy in G}. 

Proof. Assume that for each 7 £ Go player I has a maximin strategy 
& Then A,() =A*, = 0. Thus Go C G. But, by Theorem 5.4.1, 
®o D @. Therefore Go = @. Conversely, if Go = @, then, for every 
7 € Go, player I has a maximin strategy by the definition of @. 


Corollary 1. In every game G, Qo = @ if and only if for each n € Qo 
the game G, has a value and player I has a good strategy. 


Proof. This corollary follows from Theorems 5.5.3 and 5.5.4. 


PROBLEMS 


5.5.1. Prove that @ C @ if for each 7 e @o the game G, has a value and player I 


has a good strategy. 
5.5.2. Prove the following theorem: Let ne. (a) If G, has a value then 7 € ®o. 


(b) If in addition player I has a good strategy in G, then 7 € &. : 
5.5.3. Let Gy = (X, Y, M + f) where f is any real-valued function defined on X. 


Let Mg be the class of minimax strategies in Gy for all f. Prove that Mg = M. 


5.6. Conditions under Which the Classes of Strategies Are Complete 


Theorem 5.6.1. In every game the class of strategies Qe (and hence 
M.) is always complete. 
Proof. Let 1¢@. We shall show that there exists a we @, such 


that T (u) <0. Choose a such that T,(u) < v*, $ e. Sincev*, < —6, 
T,(u) <0. It remains to be shown that pe Qe; ie, v*, > —e. That 
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this is true can be seen by substituting v*, + e for T,(u) in Theorem 
5.3.3. 


Theorem 5.6.2. In every game G the class ®, is complete if each of 
the games G, has a value. 


Proof. If, for each n, G, has a value, then, by Theorem 5.5.2, Qe = 
Gea and, by Theorem 5.6.1, Qe is complete. 


Theorem 5.6.3. In every game G, the class Qo (and hence M) is 


complete if for every y in H player II has a minimax strategy in the 
game Gy- 


Proof. For any n let u be a minimax strategy for player II in Gis 
Since v*, < 0 we have T,(u) = v*, < 0. In view of the results in Prob- 
lems 5.4.4 and 5.4.5, it remains to show that ue Qo; ie, v*% = 0. But 
this follows from Theorem 5.3.4. 


Theorem 5.6.4. In every game G, the class @o is complete if for 
every 7 in H the game G, has a value and player II has a good strategy. 


Proof. If G, has a value for each n, then, by Theorem 5.5.3, Go = 
Qo. The completeness of Go now follows from Theorem 5.6.3. 


Theorem 5.6.5. In every game G, the class @ is complete if for every 
n in H, the game G, has a value and both players have goud strategies. 


Proof. Since player I has a good strategy in each G, then @ = Bo 


by Theorem 5.5.4. The completeness of @ now follows from Theorem 
5.6.4. 


Theorem 5.6.6. In every game G, the class @ is a subclass of every 
complete class @. 


Proof. Let ne@. Then since @ is complete there exists a pe œ 


with T,(u) <0. But since ņ is admissible this can happen only if 
7 =p 


PROBLEMS 


5.6.1. Prove that if @ is complete it is the smallest complete class, and if a smallest 
complete class exists it is @. 

5.6.2. Using the results of this section, give an alternativi 
theorem: Let G be an S game in which S is bounded and S* 
@ of Bayes points in S* is complete. 

5.6.3. For each 7 € G, let S, be the class of strategies » e H such that =n. Then 
the sets S, form a partition $ on @ into equivalence classes. 
complete class necessarily contains at least one element fro 
of @. 


e proof of the following 
is closed. Then the class 


Show that an essentially 
m each equivalence class 
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6.6.4. (a) If @ is complete, then every class containing one element from each of 
its equivalence classes is essentially complete. (b) If a smallest essentially complete 
class exists, it consists of one element from each of the equivalence classes of @. 


5.7. Completeness of the Class of Admissible Strategies 


In the previous section we were able to give conditions under which 
many of the classes of strategies defined in Section 5.4 were complete. 
However, none of the conditions considered was sufficient to make the 
class @ of admissible strategies complete in the general case. The com- 
pleteness of @ appears to depend on some properties of the game G 
which go beyond the existence of values and optimal strategies in the 
class of games G,. This is exemplified by the following 


Theorem 5.7.1. The class @ of admissible strategies is complete if 
the game G = (X, Y, M) satisfies the following conditions: 
(i) There exists a sequence {2;} in X such that for every x in X there 
is a subsequence {2;,} of {aj} such that 
lim M(t; 2) = M(a, n) 


mete 
for all y in H. : . 
(ii) For every sequence m1, 72, +- for which Y,,(m41) < 0 for all ¢, 


there exists an 7* such that T,,(y*) < 0 for all 7. 


Proof. If, for some u, M,(x, u) = 0, then clearly M,(2;, n) = 0 for 
allj. On the other hand, if, for some x, M,(x, ») < 0, then, by assump- 
tion (i), there exists a j such that M,(z;, u) < 0. Thus assumption (i) 
implies that Y,(u) < 0 if and only if M,(x;, u) < 0 for every j. For 
any ne H we define H, as the set consisting of all neH such that 


T,(u) < 0. We also define 
2, M (ai, #) 
(1) w= Ue 


t=1 


Clearly, for all » e H, Q(u) exists since M is by definition a bounded 
function. (The quantity Q(«) js the payoff to player I if Player II 
employs strategy u and player I employs the probability distribution 
E = (1/2, 1/4, ---) over the sequence of elements 21, t2, -+ +.) More- 
over, the function Q satisfies the ce fee that Q(u) < Q(n) whenever 
Talu) < ith equality holding only if » = 2- 

j Hirie is T dmissible if it possesses the property that T,(«) 
< 0 implies T,(n) < 0 for all u. Hence it must follow that, for all 
neH — aQ, there exists a » which is better than 7. Also it follows from 
the definition of the classes @ and H, that if pe @ N H, then u = 7 


whenever 7 e H — @. 
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Assume that @ is not complete. Then H — Q # Ø. Moreover, 
there exists an 7,¢ H — @ such that @ N H,, = @. If this were not 
true, then for every ne H — @ we could find a ne @ which is better, 
contradicting the assumptions that @ is not complete. We now select 
an no £ Hn, such that 


Q(m2) < inf Qu) +4 


and consider the set H,,. Then by an argument similar to the above 
we show that H,, = Ø and @ N Hn = Ø. We next select an 73 e Hn, 
such that 


ns) < inf Qu) + 3 


and so forth. In general, for m > 1, we select an nm from H,,,_, such 
that 


Qm) < int Qu) + 
pe Hrm- m 
By assumption (ii) there exists an »* such that T,,(n*) < 0 for all 7, 
so that 7* e H,, for all 7. We shall prove that 7* is admissible, contra- 
dieting the fact that, for all 7, @ N H,,= Ø. Suppose y* is not ad- 
missible. Then there exists a v e H which is better; i.e., 


(1) Tp) <0 and %,(q*) >0 


By substituting 7; for n, n* for u in expression 2 in the proof of Theorem 
5.3.3 we see that T,,(v) < 0 for all k so that v e H,, for all k. More- 
over, in view of (1), v = n; for any k, and hence for all k 


(2) Qln*) < Qn) < QW) + z 


which implies that Q(n*) < Q(%). But, by (1), QW) < Q(n*). Thus 
the assumption that * is not admissible leads to a contradiction and 
the theorem is proved. 

It will be shown in Chapter 7 that the conditions of Theorem 5.7.1 
are satisfied for a large class of statistical games in which the statisti- 
cian has a finite number of possible actions. 


PROBLEMS 


5.7.1. Employ Theorem 5.7.1 to give an alternative proof of the theorem that in 
any S game in which the set S is bounded and 5S* is closed the class @ is complete. 
5.7.2. Prove Theorem 5.7.1 without the assumption of the boundedness of M. 


CHAPTER 6 


Fixed Sample-Size Games 
with Finite © 


6.1. Introduction 


In this chapter we shall consider fixed sample-size statistical games 
G = (Q, D, p) in which the possible states of nature are finite in num- 
ber; i.e., Q = (1, 2, +++, h), where Z = (Z, 9, p) is the sample space. 
For a fixed je Q, we shall designate the probability distribution on Z 
either by p,(z) or by p(z | j). Let A be the action space, and let L(j, a) 
be the loss to the statistician if nature’s state is j and he takes action 


aeA. We define 
(6.1.1) W = {w(a) = (wi(a), +++, wala): wla) = L(j, a), ae A} 


Then W is a subset of h space. 
Consider a game Go in which the statistician takes an action ae A 


without experimentation. Then Go is equivalent to an S game [see 
Definition 2.4.1] in which the statistician chooses a point w e W, nature 
chooses a coordinate j, and the loss to the statistician (i.e., the payoff 
to nature) is w;, the value of the jth coordinate of w. 

Suppose the statistician performs a fixed sample-size experiment and 
employs a decision function deD which maps Z into A. Then, for 
every outcome z £ Z, the loss to the statistician is u,(z) = L(J, d(z)) if 
nature is in state j. Thus, in a fixed sample-size game in which Q is 
finite, a choice of a decision function d e D is equivalent to a choice of 
a vector-valued function u which maps Z into W and is such that for 


any ze Z 
u(z) = (u6), o un(2)) = (Wi, +++, Wa) 


The class of decision functions D can therefore be replaced by a class U 

of vector-valued functions u. 
Now assume that nature i 

fixed sample-size experiment an 


s in state j, the statistician performs a 
d employs a decision function de D or, 
143 
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equivalently, a vector-valued function ue U. Then the risk is given by 


(6.1.2) pC, d) X LG, d(2))p(e | J) 


= © wepe] j) = Ew) 


zeZ 


We now define a set S in h space by 
(6.1.8) S = {p(d) = (e(d), ---, pn(d)): (d) = p(j, d), de D} 


The original game G is then equivalent to an S game in which the statis- 
tician selects a point p(d) e S, nature selects a coordinate j, and the (ex- 
pected) loss to the statistician is the value of the jth coordinate of the 
point selected. 

Note that every decision function u (i.e., random variable) which 
maps Z into W results in a point p(w) in S given by 


(6.1.4) p(u) = (E(u), «++, Enlun)) 


Thus in the game Go the loss to the statistician is the value of a coordi- 
nate of a point in W, whereas in the game G (i.e., when the possibility 
of experimentation is introduced) the loss is an average (expected value) 
of a coordinate of many points in W which are values of a random vari- 
able u, and this random variable is at the choice of the statistician. 

In many statistical problems it is often the case that if Q is finite so 
is the space A of terminal actions, though the number of states of na- 
ture does not necessarily correspond to the number of possible actions 
for the statistician. However, there are important situations where A 
is not finite. As an illustration, consider the problem of estimating the 
proportion of defective items in a lot of size M on the basis of a random 
sample of size N selected from it. Let R stand for the number of de- 
fective items in the lot and w = R/M. Then 2 consists of the M+ 1 
states (0, 1/M, 2/M, ---, 1). On the other hand, for a given loss 
function it may be profitable to consider every number between 0 and 
1 as a possible estimate of w so that A may consist of all points in the 
closed interval (0, 1). 

The general case in which A is finite will be treated in the next chap- 
ter, and several theorems applicable to the present situation will be 
found there. However, the finiteness of 2 which is assumed here and 
the consequent reduction of the game G to an S game will make it pos- 
sible to characterize these statistical games in greater detail, regardless 
of the structure of the space A. 
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6.2. Complete Classes of Strategies in Games with Finite Q 


In Chapter 5, Section 5.2, we have given a detailed characterization 
of the complete classes of strategies for player II (the statistician) in 
S games under the assumption that the game S*, the mixed extension 
of S, is a closed, bounded, and convex subset of h space. That S* is 
convex follows directly from the definition. It remains to investigate 
conditions under which S, and hence S*, is closed and bounded. 


Theorem 6.2.1. Let W be the subset in h space defined by (6.1.1), 
and let S be the corresponding subset in h space defined by (6.1.3). 
Then if W is closed and bounded so is S. 


Proof. The boundedness of W implies the boundedness of |S as can 
be seen from Equation 6.1.2. It remains to show that S is closed. 

Since Q is finite, there is no loss in generality in assuming that Z con- 
sists of a countable number of elements 21, 22, ***- For any deD let 


(1) ar = LG, da), = F=1,%-° 
So that 

@) pjd) = È anpil) 

and jä 

6) p(d) = (x agi Pi (Zn), °°" x anpil) 

In view of the above, a decision procedure de D and hence ue U can 


a sequence of vectors {a1, az, +} in 


be thought of as a specification of 
hat, if zą is observed, the vector az 


Ww with a, = (ap, +++, agn) such t 
1s chosen from the sequence. 

Let {p(dm)} be : seed sequence of points in S. We shall show 
that the limit belongs to S. Let dm = {a1 a™s, +++}. By the Can- 
tor diagonal method we can select a subsequence {dn} with dn = 
fam a™,, ---} such that a”, — afr asn —> © for each k. Alo 
a*,eW since W is closed. It remains to show that p(dn) — p(a*) 
Where d* = {a*;, a*o, ---}. But this follows from Theorem 3.11.5 
Since each coordinate of p(dn) is given by (2) with a,j replaced by 


a™),. f > oe 

uence holds not only for discrete probability distributions 
but also for densities. The proof, however, is beyond the scope of this 
book. In fact, it can be shown that, in the case of densities and under 
the conditions of Theorem 6.2.1, not only is S closed and bounded, but 
it is also convex. The implication of this result is that if yg consists of 
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a finite number of densities the statistician need never use a randomized 
strategy since whatever risk can be achieved with a randomized strategy 
can also be achieved with a pure strategy. More precisely, the pure 
strategies in this case form an essentially complete class. We also re- 
mark that the conditions of Theorem 6.2.1, namely, the boundedness 
and closure of W, are not restrictive and hold in a great many statisti- 
cal problems. As was mentioned before, the boundedness of W follows 
from the boundedness of the loss function L. The closure of W is 
guaranteed, for example, in case A is finite or if A is a bounded closed 
subset of n space and L(j, a) is a continuous function of a for each Í 

We see from Theorem 5.2.1 that if Q is finite and W is bounded (at 
least from below) the class @ of risk vectors arising from Bayes pro- 
cedures can be represénted as certain boundary points of S*—the con- 
vex hull of the set S defined by 6.1.3. If in addition W is closed, then, 
by Theorems 6.2.1, 5.2.3, and 5.2.4, the class @ of admissible risk vec- 
tors and the class D—the closure of the set of risk vectors arising from 
Bayes procedures against é’s with positive components—are complete 
and, moreover, are subsets of ®. It is of interest, therefore, to charac- 
terize those sequences of vectors in W which result in Bayes points in 
the set S. 

Since Q is finite, the space = is an (h — 1)-dimensional simplex—i.e., 
a convex set in S} spanned by the unit vectors 8; = (1, 0, 0, -- -, 0), 
82 = (0, 1,0, +-+, 0), +-+, && = (0, 0, 0, +-+, 1) and every $e Z is given 


by È = (E(1), (2), +++, £(h)), £G) > 0 for alli and Ð £() = 1. Thus, 
isl 

for example, if h = 3, Z can be represented as an equilateral triangle of 
unit altitude, and, if drawn in a plane, the coordinates of a point $ in 
the triangle can be taken as the perpendicular distances from the point 
to the 3 sides. (From elementary geometry we know that the sum of 
the perpendicular distances frorn any point in an equilateral triangle 
to the opposite sides is equal to the altitude, which in our case is unity.) 

For any § €&, let £.,(7) be the a posteriori probability of the state j 
when the outcome of the experiment is z}. Then [see Definition 3.6.4] 


La jaiii 
DX p:e 
i=l 
We shall denote the vector (,,(1), +-+, &(h)) by &,, and the quantity 
h 


> Dilex)E@) by Peler). 
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For any choice of a decision function d or, what is equivalent, for 


any choice of a sequence of vectors ay, a2, ++- in W, the risk p(&, d) is 
given by 
h œ 
(6.2.2) PÈ, d = D> D aped) 
j=1 k=1 
a= (= ata) Piler) 
E Net 


>L (= atta) Pe(ex) 


k J=1 
where for a fixed k 


h h 
(6.2.3) 2 aral) = min, > arja) 
j= ape W j= 


Thus we have the following theorem. 
Theorem 6.2.2. The sequence of vectors 
{a*r} = {(a*n, 0% a*n) } 


in W determined by (6.2.3) for each k is Bayes against Ẹ, provided the 
minimum of az., is assumed in W for each k. 
We observe that, for any 2, and §, the a posteriori probability §., is 


a point in =. Thus for each £ e Æ the sequence of possible observations 
- in Z and hence a 


21, 22, +++ determines a sequence of vectors RR Bay oe a 
a, ey’ ' 
Sequence of vectors a*;, a%2, °** In W such that a*, is Bayes agains 


Sy. A point in S is then a coordinate-wise weighted average of the 
Sequence {a*,}, the weight of the jth coordinate of the kth element of 
the sequence being p(z; |j). It is to be noted that in ye to obtain 
the sequence {a*,} it is not necessary to consider all of W but only a 
subset of W, viz., those we W which are Bayes against some e= and 
which form the extreme points of W *—the convex hull of W. - [See 
Problem 5.2.3.] If this subset contains only a finite number of distinct 
Vectors, then the sequence {a*z} will also contain only a finite number 


of distinct vectors. 
We conclude this section with the follow 
risk as a function of the a priori probability E. 


Theorem 6.2.3. The risk p*(6) from a Bayes decision procedure is a 
Continuous and concave function of A 


ring theorem about the Bayes 
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Proof. By definition, for any §, and any de D, 


h h 
PÈ, d) = X ODE = E oG, DEG) 
J j=l 
where p(j, d) is given by 6.1.2. Hence 
h 
p*(Ē) = inf oÈ, d) = inf D pt0) = inf p- 
a Pes j=1 


e Pes 


Since L(j, a) and hence S is bounded and since 


(6.2.4) inf p- = — sup [—p-§] 
Pes Pes 


the theorem follows from Theorem 2.2.7. 


PROBLEMS 


6.2.1. Let Z = (Z, 9, p) where Z is the set of non-negative integers, 9? consists of 
two elements w = 1 and w = 2; 


Pz | w) = = 
i.e., p(z | w) is a Poisson probability. 
(1) Let A consist of two elements a and ap, let £ = (&(1), &(2)), (1) = 0.6, &(2) = 
0.4, and let the loss matrix W be given by 


ce of a posteriori probabilities {8}, 2=0, 1,2, -6., 
1 ++- resulting from the Bayes procedure. 
(d) Describe the Bayes proce: i ms of a partition of Z, % 

(e) Determine (approximately) the corr 
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(3) Do part 1 with A = (ay, a2, a3); (1) = (2) = 1/2, and 


(4) Do part 1 with A = {a:0 < a < 3}; (1) = 0.3, §@) = 0.7, and 


L(w, a) = 100w — a)’, w= 1,2, 0<a<3 


6.2.2. Consider the sample space £ in Problem 6.2.1. Let A = (ai, a2), and let 


the loss matrix W be given by 


(a) Characterize the Bayes points of S in terms of the probabilities of taking the 
two actions. 
(b) Compute an adequate number 
Set S*, 
Pere For any game G = (9, D, p) in 
the Set W in h space be defined as in equ 
e points in W we define 


of boundary points of S*, and sketch the entire 


which Q is finite, say, 2 = (1, 2, «++, h), let 
ation 6.1.1. In terms of the coordinates of 


wj = inf wa), 7 = SUP wa), j=l A 
T aea aaa 
Consider the interval (IV) in h space defined by 
TW) = {x = (z t 0: W Systm FHhars Mi 


re S is defined by equation 6.1.3. 


(1) whe: 
: Section 4.3] in terms of n(W). 


Clearly W7 c m(IV). (a) Show that S C 
regret [see 


ê xplain the notion of minimax loss oT 
-2.4. Show that W C S. 


ons for Finite Q 
e-size game in which Q is finite, 


f risk vectors which lie 
t S* in an h space. 


6.3. Bayes Soluti 


We have seen that in a fixed sampl 


e class @ of Bayes solutions result in a set : x Se 
on a specified portion of the boundary of a conve? 
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This set can be said to characterize geometrically the Bayes procedures. 
In this section we shall study two other types of representation of this 
class of strategies—one in terms of a partition of the space = of a priori 
probabilities and the other in terms of partitions of the sample space 
Z by means of likelihood ratios in case A is finite also. 

Theorem 6.3.1. Let W be closed and bounded. For any we W 


define 
T(w) = ($ ez: min (E-u) = &-w} 


h 
where the symbol -u stands for J- w,é(é). Then the sets T(w) are 
fat 


closed and convex, and, if w; = We, then T(w,) and T(ws) have no in- 
terior points in common. 


Proof. That T(w) is closed follows from the fact that it is defined 
in terms of equalities of continuous functions. To show that T(w) is 
convex, let §; and Ëz be any two points in T(w), and let 


Ea = a1 + (1 — aE, Sesi 
Then since Ẹ-w is linear in the components of £, we have 


(6.3.1) min (aʻu) > a min (€;-u) + (1 — a) min (2u) 
ueW ue W ue W 


= a(i -w) + (1 — a)(Ë2:w) 
On the other hand, 


(6.3.2) min (a:U) < ba'w = a(1-w) + (1 — a)(&>-w) 
so that 
(6.3.3) min, Eau) = a(§1-w) P (1 = a) (E2-w) = aw 


and hence Ëa £ T(w) which proves the convexity of T(w). To prove 
the remaining part of the theorem, we observe that, for all ë e T(w,) N 
T(w2), we have 


h h 
(6.3.4) min (§-u) = 2 wkl) = X wl) 
ue j= j=1 


That is, the common points of T(w,) and T(wə) must all lie on an 
(A — 2)-dimensional hyperplane in h space and consequently cannot be 
interior points of the sets. 

Theorem 6.3.1 gives the following characterization of Bayes solutions 
when Q is finite. The space = is divided into closed convex subsets 
T(w) which have only boundary points in common. Let Ë be given. 
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Then, if the outcome of the experiment is z, compute §:. If §.e T(w), 
choose the vector w, or, equivalently, take that action ae A which re- 
sults in this vector we W. 

Note that this particular characterization of the Bayes solutions is 
independent of the functional form of p(z |J). More precisely, all sta- 
tistical situations which involve the same structure for A, the same 
number of states for nature, and the same loss function L(j, a) will 
yield the same sets T(w) and consequently the same Bayes procedures. 

As an illustration, consider a statistical situation in which nature 
has 3 states, the statistician has 3 possible actions, and the loss matrix 


is of the following form. 


States of Atas 
Nature 1 2 3 
i 0 50 100 
2 50 0 50 
3 100 100 0 


Here W consists of the 3.points wi = (0, 50, 100), w2 = (50, 0, 100), 


ws = (100, 50, 0). Let & = (é(1), £(2), £(8)) be any a priori probability 
of the 3 states of nature (or, alternatively, the a posteriori probability 
of the 3 states after the experiment has been performed). Then 


£-w; = 50E(2) + 1004(3) 
E-wo = 50E(1) + 100E(3) 
£-ws = 100&(1) + 50E(2) 
Now action 1 will be taken if 
E-w, < Èw and Ẹ-wı < -w3 
i.e., if €(1) > £(2) and (1) > (8). Action 2 will be taken if 
-wa <£-w, and -w2 <§-ws 
i.e., if £(2) > ¿(1) and (3) < 1/3. Action 3 will be taken if 
È-ws < -w2 and -wa < -wi 


i.e., if (3) > 1/3 and £(8) = (1). 
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If the space = is represented as a triangle, the three regions T(wj), 
T(we) and T(ws) are shown in Figure 8. The Bayes procedure for a 
single experiment is to compute the vector £+, locate this point in the 
triangle, and take action ¢ if £. falls in T(w,). 

Suppose in addition to 2 being finite A is also finite; say A = (1, 2, 
++, k). Then in a single-experiment problem, the Bayes solutions [see 


(1/2,0,1/2) 


(0, 2/3, 1/3) 
Figure 8 


also Section 7.3] are as follows: We take action i if 7(7) < 72(j) for all 
j = i where 
h 


LLG, dela), 
635 7,@) =", = DG, DEC) 
Level nea 7 
j=1 


Thus, as we have seen in the previous illustration, Bayes solutions in 
the problem under consideration are characterized by linear inequalities 
in the a posteriori probabilities £2(j). Since only the numerator of the 
second term of (6.3.5) matters, the Bayes solutions can also be consid- 
ered as being characterized by linear inequalities in the probabilities 
(or densities) p(z | Jj). Moreover, we can divide each of the latter in- 
equalities by some p(z | r) and obtain linear inequalities in the likelihood 
ratios p(z| j)/p(z|r). This type of characterization of a Bayes solu- 


tion is especially useful in two special cases which we shall now investi- 
gate in some detail. 
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Case 1. Q = (1,2), A = (1, 2). Here nature’s states are represented 
by one of two probabilities p;(z) and po(z), and the statistician has to 
decide whether the sample point z came from state 1 or state 2. This 
case, known as the case of a simple hypothesis against a simple alterna- 
tive, has been extensively studied in statistical literature. While the 
assumption that nature has only two states is often artificial, the sta- 
tistical procedures based on this assumption are quite general, as can 
be seen, for example, from Theorem 7.4.4 below. 

Let & = (¢, 1 — t) where ¢ is the a priori probability for state 1 and 
1 — ¢ the a priori probability for state 2. The most general loss func- 
tion in this case can be taken as w if nature is in state 1 and the statis- 
tician decides 2, unity if nature is in state 2 and the statistician decides 
1, and zero otherwise. The Bayes procedure is to decide 1 if 


(6.3.6) (1 — S)po(z) < wtp) 


or, equivalently, 
(6.3.7) pile) o W 

mz) L= 

and to decide 2 if the inequality is reversed. 

We note that, for a fixed w, as ¢ varies from 0 to 1, c varies from 0 
to. Similarly, for a fixed ¢, as w varies from 0 to %, ¢ also varies from 
Oto, Thus by varying c from 0 to % we generate all Bayes procedures 
for every possible loss and a priori probability. 

In case the sample space is X = (X, 9, p) with x = (1, ses, ay) 
where the 2,’s are independently distributed with probability (or den- 


sity) f,(x), i = 1, 2, then p:(x) = I] f(x). Taking logarithms on both 
j=l 


sides of (6.3.7) and setting 
fo(x;) 


Eaa) 


z=l 


N 
the Bayes procedure reduces to deciding 1 if 2 zj Sk = loge and de- 
j= 


ciding 2 otherwise. Since for large N approximate probability state- 
ments are often easy to obtain for sums of independent random varia- 
bles, the above reduction is of great practical importance. 

Let Sp be the set of points in the sample space satisfying inequality 
(6.3.7), and let C(Sz) be its complement. Furthermore, let a(¢) be the 
probability that the sample point v will fall in C(S:) when nature is in 
state 1, 6(¢) be the probability that x will fall in S; when nature is in 
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state 2, and let p*(¢) be the Bayes risk as a function of ¢ Then 
(6.3.8) * (5) = walt) + (1 — OLE) 


By Theorem 6.2.3, the function p*(¢) is continuous and concave. 
Moreover, p*(0) = 0, p*(1) = 0 and hence p*(f) will attain a maxi- 
mum at some point ¢ = fp. 

If p:(x) are densities, then p*(¢) when plotted against ¢ will have the 
shape as given in Figure 9. Suppose the statistician believes that the 
a priori probability for state 1 is ¢’ and he operates with the region 


y 


y = walt) = Bs) 


0 p g 
Figure 9 


Sy while in fact some other ¢ is the true probability. Then his average 
risk py:(¢) is given by 


(6.3.9) Pel) = Swa(s’) + (1 — OLE) 


which is linear in ¢ and is tangent to the curve y = p*(¢) at the point 
t=. His loss may be great if ¢ departs from ¢’ by an appreciable 
amount as can be seen from Figure 9. On the other hand, if he is not 
certain of ¢ and operates with the region Sy, then his average loss will 
never exceed p;(5o) = wa(to) = B(to) since the line Y = pr(fo) [see Fig- 
ure 9] is parallel to the ¢ axis and tangent to the curve y = p*(¢) at the 
point ¢ = fo. This, of course, is the minimax procedure. We can sum- 
marize the above results by saying that, if p;(x) are densities, the mini- 
max procedure is obtained by finding that region S for which wa(¢) = 
B(f). This, of course, can be obtained by varying c in (6.3.7). Once 
the proper c is found, the least favorable probability ¢> can be obtained 
by solving for ¢ in the equation wt/(1 — =c. 

In case p,(x) are discrete probability distributions, the curve y = 
p*({) will generally consist of connected line segments. This follows 
from the fact that a whole interval of ¢’s will generally yield the same 
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a and 8. It often happens in this case that the maximum of p*(f) oc- 
curs at a point where two line segments (one with a positive slope and 
one with a negative slope) meet [see Figure 10]. In this case the mini- 
max procedure is obtained by mixing any two Bayes procedures with 
regions, say Sq and Sẹ which yield the line segment a and the line seg- 
ment b respectively [see Figure 10]. More specifically, let aa, Ba be 
the probabilities of errors using region Se and a», By be the probabilities 
of errors using region S»; then, if we employ region Sa with probability 


y 


y=7*8, +0 -7") 8, 


0 rA 


Figure 10 


y and S, with probability 1 — y, the average risk p,(¢) for any ¢ is 


given by 
(6.3.10) py(t) = vifwae + (1 — HBa] + 0 = viwa + (1 — $82] 


ned by finding that y = y* which 


and the minimax procedure is obtai fin l 
The minimax risk has the value 


makes (6.3.10) independent of t. 


"Ba + (1 — y*)Bo . 
Case 2, 2= (1,2, °°), 4= (l2 h), and the loss is a con- 
is j and the statistician decides 7 # j 


stant, say w, if the state of nature 1 : A eee es 
and the loss is zero if i = j. Let the associated probability distributions 
be designated by p:(z), i = 1, 2, +75 h, and let § = (&(1), +++, &(h)) be 


any a priori probability distribution over the h states. Then for any 
decision 7 the a posteriori risk 7 is given by 


h 
w Do vO) 


(6.3.11) m ee 
E nOD 


j=1 
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Hence the Bayes procedure is to decide 7 (i = 1, 2, ---, h) if 
h h 
(6.3.12) D BOADE wO) 
+i=1 j=k=1 


or, equivalently, 


(6.3.13) 


A(z i 
px(2) < HOJ 
pi(z) Elk) 
for all k = 1, 2, ---, h. As in case 1, the Bayes procedures for this 
special multidecision case have a very simple representation in terms 


of likelihood ratios. Illustrative examples of this case will be given in 
the next section. 


PROBLEMS 


6.3.1. Letz = (21, -+ +, Z100) be 100 independent observations on a random variable 
which is normally distributed with variance o? = 25 and mean u. Let Hy be the 
hypothesis that » = 0 and Hə the hypothesis that y = 2. Let = (t, 1 — & where 
¢ is the a priori probability for Hy. 

(a) Plot the Bayes risk as a function of ¢ if the loss matrix is given by 


States of Dean 
Nature iy Np 
Hı 0 25 
He 10 0 


(b) Obtain the minimax procedure, and compute the least favorable £. 
6.3.2. Let z be a binomial random variable with P(e = 1) = w and P(¢ = 0) = 
1 — w. Let Hj be the hypothesis that w = 0.4 and Hp the hypothesis that w = 0.6. 


Let ¢ be the a priori probability for Hı. We wish to test H, against Hə on the basis 
of 10 independent observations on z. 


(a) Plot the Bayes risk as a function of ¢ if the loss matrix is given by 


States of Decision 


Nature 


(b) Obtain the minimax procedure and compute the least favorable £. 
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6.3.3. Draw the Bayes decision regions T(w;), T(we), T(ws) in an equilateral 
triangle for the loss matrix 


States of SoHo 
Nature 1 2 3 
1 0 50 10 
2 50 0 20 
3 100 50 0 


6.4. Illustrative Examples of Fixed Sample-Size Statistical Games 
in Which Q Is Finite 


Example 1. Multiple Classifications. The theory of multiple classi- 
fications or, as it is commonly known, discriminant analysis deals with 
the problem of assigning an individual to one of several possible groups 
to which he may belong on the basis of a set of measurable characteris- 
tics observed on him. For example, we may wish to classify a plant 
specimen into one of two species on the basis of measurements of its 
stem and leaves, or to assign a skull found in archeological excavations 
to some dynastic period on the basis of anthropometric measurements 
made on it, or to determine whether a candidate in a pilot training 
school will be a success or a failure on the basis of aptitude tests. 

From a theoretical point of view, the problem of discriminant analy- 
sis is a multidecision fixed sample-size problem with finite 2. The 
main distinguishing features of this problem are: (a) A decision here is 
usually made on the basis of a single observation on several correlated 
characteristics, and (b) it is not unreasonable in this problem to assume 
the existence of a priori probabilities since they represent the propor- 
tions in which the populations under consideration are mixed. 

We shall first consider the problem of two populations which may be 
formulated as follows: An individual on whom we observe the charac- 
teristics zı, +-+, tv is known to come from one of two populations IT, 
and To, but it is not known from which. The probability distribution 
(or density) of these characteristics is given by pi(t1, +++, æy) if the 
individual belongs to Mı, and Par = xy) if the individual belongs 
to Io. These functions are assumed to be known. We also assume 
that, if an individual comes from M and he is classified into Mə, there 
is a loss of w units, and, if he belongs to I2 and he is classified in TH, 
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there is a loss of v units. If he is classified correctly, the loss is zero. 
Thus, for example, in the problem of training pilots, we might assess 
what it would cost us to reject a possible good pilot (w) and what it 
would cost us to train a “washout” (v). Without loss of generality v 
can be taken as unity. 

We further assume that the two populations are mixed in the pro- 
portion of ¢ and 1 — §, so that, if an individual is chosen at random, ¢ 
is the a priori probability that he comes from I, and 1 — ¢ is the 
a priori probability that he comes from Io. 

On the basis of the above information we wish to set up a procedure 
for classifying an individual into one of the two populations. The opti- 
mal procedure here is obviously the Bayes procedure discussed under 
case 1, Section 6.3, above. In case ¢ is unknown, the minimax pro- 
cedure might be employed. The method of computing the minimax 
procedure can also be found under case 1, Section 6.3. 

Of particular interest is the case where both p;(x, -- +, ty) and 
p2(%, *++, ty) are multivariate normal with the same variance and co- 
variance matrix (¢;;) but different means 6; = (011, +++, 01y) and ba = 
(021, ***, Gav) respectively. That is 


% 1, 
P(@1, +++, 2y) = (rr P |- = DD oF (es — Oi) (a4 — a) | 


t=1j=1 


k = 1, 2, where o” are the elements of the inverse of the matrix (oij) 
and | o” | is the determinant of the matrix (o). In this case, the Bayes 
procedure, after taking logarithms to the base e on both sides of (6.3.7), 
reduces to the following: Classify the individual into TI, if 


N 
(6.4.1) Ha) = Eara i tog = 


t=1 


and into Is otherwise, where 


i 
(6.4.2) Ni = D c” (6; — 0y) 
j=1 


Since the v’s are jointly normally distributed so will U(x) be in each 
population with mean 1(,), k = 1, 2, and common variance 


NNO 
(6.4.3) oa = D D o” (0a — 91%) (82; — b13) = L02) — L01) 


i=l j=1 
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Let a(¢) and 6(¢) be the probabilities of misclassification. Then 


o% 
— (1/2)? dt 


(6.4.4) aœl¢) = al 
1 v 

6.4.5 = f -a/a 
(6.4.5) B(S) ve E dt 
where 

we c 1 we c 
6.4.6 s2 = BRS =o 
Í i oa c = L—¢ f o k =f 2 


as can easily be verified. The minimax procedure is obtained by finding 
that ¢ = {> which satisfies the equation wa(¢) = (¢) and substituting 
the result in (6.4.1). Note that if w = 1 (i.e., the two losses due to 
misclassification are equal) then a(¢) = B(f) for ¢ = 1/2, and the mini- 
max procedure is then simply to classify the individual into I, if I(x) 
< Me) + Ga) and into Iz if the inequality is reversed. 

Before we leave the problem of two populations, the following re- 
marks may be of some interest. 

We are sometimes faced with a situation where we are dealing with 
n individuals who could have come from one of two populations, say 
TI, and Ig, and we wish to choose a subset of them having the greatest 
probability of belonging to Hı. For example, from a group of 100 can- 
didates applying to a pilot training school, we might wish to choose 25 
best prospects on the basis of some tests. Here we have the problem 
of ranking the individuals according to the likelihood that they come 
from I}. 

If we know ¢, the a priori probability that an individual selected at 
random belongs to II, we can compute for the ith individual the a pos- 
teriori probability ta where t: = (tin +++, ti) @ = 1, +++, n) and 
then rank these ¢,,’s. It turns out, however, that we need not know ¢ 
in order to accomplish this ranking. This follows from an easily veri- 
fied theorem that a necessary and sufficient condition that ts; > f,, is 
that 
pi(@yi, °° +, Ni) > Pi(ij, +++, ENa) 


(6.4.7 > 
: Polti +t, Zwi)  P2(Tij ** +, BG) 

Another situation that might be encountered is the following: We 
are given a pair of individuals, and it is known that one belongs to TI, 
and the other to Is, but it is not known which individual belongs to 
which population. For example, suppose two horses are racing against 
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each other and we wish to choose the winner on the basis of some char- 
acteristics observed on the horses. Barring a photofinish, we know that 
one is going to be a winner and the other necessarily a loser. 

Let 2/1, 2’2, ++, T'ẹ and 2’, 29, ---, y be the values of the 
measured characteristics on the two individuals and let Pilti, +++, ty) 
be the probability (or density) of the characteristics in I; @ = 1, 2). 
We observe that here only one error of misclassification is possible, so 
that w =v. Moreover, the assignment of the marks “prime” and 
“double prime” to the two individuals might as well be made by chance 
see Section 8.6], thus giving an a priori probability of 1/2 that the in- 
dividual marked “prime” belongs to Tl, (and hence the one marked 
“double prime” belongs to Iz). The Bayes solution is then as follows. 
Classify the individual marked “prime” into Tl, if 


Poe's, +1, t'w)pi (2's, «+, wN) 


6.4.8) 
t Piles, +++, 2’) po(x"y, +, ay) 


and reverse the classifications otherwise. Because of the symmetry of 
the problem, this solution is also minimax. 

The problem of multiple classifications when the number of popula- 
tions involved is greater than 2 presents no theoretical difficulties inso- 
far as the Bayes solutions are concerned and has essentially been solved 
in Section 6.3. The difficulties arise when we desire to compute the 
risk of misclassification, and this can only be done with moderate ease 
for some very special cases. These difficulties are, of course, not re- 
stricted to discriminant analysis but are inherent in all multidecision 
problems. 

We shall here consider a simple illustrative example of a multiple- 
classification problem. We are given an individual who could have 
come from one of 3 populations II,, Is, II3, and we wish to classify 
him into one of the populations on the basis of N characteristics 2 = 
(x1, £2, +++, @v) observable on him. These characteristics have a known 
multivariate normal distribution in each of the populations with the 
same covariance matrix (¢;;) but different means 0, = (041, +++, Orn), 
k = 1, 2,3. The loss due to misclassification is assumed to be constant. 
The problem is to obtain a Bayes procedure for a given a priori distri- 
bution § = (E(1), £(2), £(3)). 

In view of the results previously obtained [see in particular, equa- 
tions 6.3.13, 6.4.1, and 6.4.2], the solution is as follows: Let 


N 
(6.4.9) Nog = D (jn — 0,9), P>q=1,2,3 
j=1 
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and let 

N 
(6.4.10) lng(z) = bD Apati 


i=1 
Then classify the individual into I, if 
15) (02 11 (0. 1 
21 (82) + a „l ) 


loi (x) < 3 og £2) 

and 
131 (63) + 131(01) &(1) 
Igi(x) < z + log £3) 


Classify the individual into Io if 


121 (82) + lo (81) é(1) 
l 
lo (x) > a E a) 


and 
lz2(03) + ls2(02) &(2) 
lz2(x) 2 log E8) 


Classify the individual into Ig if 


l32(83) + I32(82) (2) 
l32(x) 3 + log 8) 


and 
l31(83) + l31(01) (1) 
Igi(x) > 2 log £3) 


We note that this Bayes procedure involves 3 linear functions of 
normally distributed variables and hence have themselves a trivariate 
normal distribution with means, variances, and covariances which can 
easily be computed. The probability that an individual who belongs, 
Say, to IT, will be classified into Ig or Ia can be computed from a bi- 
Variate normal distribution. : 

Example 2. The Problem of the n-Faced Die. The following prob- 
lem is very similar to the last two discussed above and admits a variety 
of generalizations. We are given an n-faced die in which one side has 
a higher probability of appearing than any of the other sides. More 
Specifically, we assume that one side has a known probability p of ap- 
Pearing and each of the remaining sides has a probability (1 — p)/ 
(n ~ 1) of appearing with p > (1 — p)/(n — 1). The problem is to 
decide on the basis of N tosses of the die which of the n sides is the 


biased one. 
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Suppose we label the faces of the die with, say, the integers, 1, 2, 
<tr, n. Then we will be considering n alternative hypotheses, Hı, Ho, 
-++, Hn where H; is the hypothesis that the face labeled with the inte- 
ger t is the biased one. In view of the symmetry of the problem, it is 
entirely appropriate to label the faces randomly [see Section 8.6] thus 
inducing an a priori probability of 1/n for each of the alternative hy- 
potheses. Moreover, it is also appropriate to consider here a constant 
loss due to a wrong decision. Thus this problem can be reduced to the 
multiple-decision problem considered in case 2, Section 6.3 with &(i) = 
1/n for all 7. 


Suppose the die is tossed N times and the outcome is r = (ri, T2, 
n 


+++) Tn) with > ri = N where r; represents the number of times the 
i=l 

face labeled i has appeared. Then letting p.(7) stand for the probability 

of the outcome r under the hypothesis H. i we have 


N! (ey 
misemi” n—1 


Now according to (6.3.13), the Bayes procedure is to accept H; if 


(6.4.11) pit) = 


pet) _ [(n — I)pp= 
(6.4.12) oe [>] <1 


for all k. Since by assumption p > (1 — p)/(n — 1), the above simply 
implies that we accept H; if r; = max (ris T2, +++; Tn). 

Thus far we have considered examples in which the number of pos- 
sible decisions the statistician can make equaled the number of possible 
states for nature. We shall now consider several examples where this 
situation does not hold. 

Example 3. A Hypothesis with a Two-Sided Alternative. We are 
given a normally distributed random variable with unit variance and 
a mean which can have the value —y, 0, or u. The problem is to decide 
on the basis of N independent observations T1, Tə, +++, xy whether the 
mean is zero. If the mean is zero (hypothesis Ho) and we decide that 
it isn’t zero, i.e., that it is either — u or #, our loss is w. On the other 
hand, if the mean is not zero and we decide that it is, our loss is unity. 
Here we are faced with a situation where Q consists of 3 elements while 
A consists only of 2 elements. 

Let E = (€(1), &(2), £(3)) be the a priori probability for the three 
states (—y, 0, ») respectively. The Bayes solution is to accept the 
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hypothesis Hp if 
e(1)p(@| —z) + EB)pE | u) 
£(2)p(@ | 0) 


and to reject it otherwise, where 


(6.4.13) 


VN 
(6.4.14) p(é| 6) = TE e7 W/2)G—6) 


The inequality (6.4.13) simplifies to 


(6.4.15) E(B) + Ee < E(2Jwe 

which in turn is equivalent to 

(6.4.16) cı(w, §) < Z < ce(w, È) 

where cı and c are the two roots of equation 6.4.15 with the equality 
sign holding. 

Thus the class of Bayes procedures are characterized by intervals 
(c1, c2) on the real line such that if z falls in this interval the hypothesis 
Ho is accepted, and if it falls outside of this interval this hypothesis is 
rejected. The fact that the Bayes procedures are characterized by in- 
tervals follows from a general result [corollary 1, Theorem 7.4.3] to be 
proved later. Moreover, it will also be shown [corollary 3 of the same 
theorem] that this class is admissible. 

Example 4. A Problem in Which Q = (1, 2), A = (1, 2,3). A manu- 
facturer who produces complex items sells these items in lots with a 
guarantee that each item will be well functioning. Before selling a lot, 
he tests N items from it and obtains the random variable f whose values 
are x = (xı, Tə, +++, £y). The decision he makes on the basis of the 
outcome is either to sell the lot as is (action 1) or to reassemble each 
item in it (action 2). Each lot comes to him from a production process 
which is in one of two states [i.e @ = (1, 2)] with known probabilities 
¢ and 1—¢. In state 1, the probability distribution of f is given by 
Pi(x), and in state 2 by po(x) [see Definition 3.4.2]. He will always 
take action 1 if he decides that the lot comes from state 1 and action 2 
if he decides that the lot comes from state 2. Knowing the expected 
Proportion of defective items in both states, the cost of having items 
returned to him due to the guarantee, and the cost of reassembling the 
items in a lot, he assesses his losses due to wrong decisions as wy, dol- 
lars if the lot comes from state 1 and he takes action 2 and wz; dollars 


if the lot comes from state 2 and he takes action 1. 
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Suppose after determining the Bayes procedure and computing the 
Bayes risk resulting from it (which is the best he can accomplish) he 
discovers that this risk is higher than he can tolerate. He then decides 
to introduce a third action, viz. to sell some of the lots at a cheaper 
price but without a guarantee. He assesses his new loss matrix as 
follows: 


States of Actions 
Nature 1 2 3 
1 0 Wie Wi3 
2 We, 0 Wo3 


The problem is: What conditions must the elements of the loss matrix 
satisfy (other than the obvious conditions that w13 < ws and We3 < w21) 
before it will be profitable for him to take action 3, and for what out- 
comes x will he take action 3? The solution is easily obtainable and is 


as follows: Let r2(7), i = 1, 2, 3 be the a posteriori risk if the manufac- 
turer takes action 7. Then 


(1 — £)werpe(x) 


Tz(1) = 
tpi(z) + (1 — $)pe(zx) 
(6.4.17) ei er 
Spi(z) + (1 — &)po(2) 
aoe $wr3Pi(x) + (1 — $)wespo(x) 


Spi(x) + (1 — £)po(x) 


and he will take action 7 if 7,(z) is the smallest of the 3 Tzs. More 
specifically, he will take action 1 if 


po(x) < wiz P2(x) swig 
< and ’ 
Pilz) (1 — Sway Pix) ~ (1 — $)(we1 — wea) 


he will take action 2 if 


(6.4.18) 


(6.4.19) a(x) > twi E Po(x) 3 lwi — wia) 
Pilz) ~ (1 — ¢)wə Pilz) © (1 — Pws 
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he will take action 3 if 

Po(x) twis P2(x) 2 f(wie — w3) 
Pilz) — (1 — £) (wer — w23) pi(x) ~ (1 — Jwz 


We observe that action 3 will never be taken unless 


(6.4.20) 


fwi (wiz — wis) 


(1 = ¢)(wzı — w23) (1 — £)we3 


(6.4.21) 


or equivalently 
(6.4.22) WyQW21 È Wi2W23 + W21W13 


But, if (6.4.22) is satisfied, the second inequality in both (6.4.18) and 
(6.4.19) implies the first, as can easily be verified. Thus we see that it 
will generally be profitable for the manufacturer to introduce the 3rd 
action, provided condition 6.4.22 is satisfied. The profitability of this 
action follows from the fact that the described procedure is Bayes and 
hence must be better than the original procedure which is a possible 
procedure for the new loss matrix. 

Note that (6.4.22) is a condition on the risk matrix only. Hence we 
can state that, in any decision situation where 2 = (1, 2) and A = 
(1, 2, 3) and where the risk matrix satisfies condition 6.4.22, the class 
of Bayes solutions is characterized by three intervals on the real line, 
Iı = (—0, c1), Is = (c1, ¢2), I2 = (c2, ©), where 
SCi — wis) 


wis 
6.4. = ———t Co = 
i (1 — {) (wer — w23) (1—$)we3 


and, for any ¢, the Bayes procedure is to take action j if the likelihood 
ratio po(x)/p,(x) falls in Z. ? ew : r 

An interesting consequence of the above discussion is that in a situa- 
tion where Q and A are both finite, the Bayes procedure may sometimes 
lead us to decide that nature is in state 7, even if it is known that the 
a priori probability for that state is zero. Thus, for example, let the 
loss if nature is in state į and the statistician takes action j, be repre- 
sented by a matrix (wij) (i, 7 = 1, 2, 3) with w; = 0, and let the ele- 
ments of (w;;) satisfy condition 6.4.22. Then the Bayes procedure will 
sometimes lead us to take action 3, even if the a priori probability for 
state 3 is zero. This situation is illustrated in Problem 6.3.3. 

Example 5. Estimation of the Fraction Defective in Finite Lots When 
the Loss Function Is Quadratic. We are given a lot of M items where 
each item js classified as either defective or non-defective. The number 
R of defective items is unknown. It is desired to estimate the fraction 
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w = R/M of defective items in the lot on the basis of a random sample 

[see Section 7.7] of N items selected from it. Let x; represent the qual- 

ity of the ith item selected for inspection; i.e., x; = 1 if the item is de- 

fective and x; = 0 if it is non-defective, and let x = (£1, 2, «++, ty). 
N 

The probability distribution of r = > z; is 


i=l 


(rae) ed 


6.4.24 R) = 
( ) a(r | R) $ 5 
N. 
1 
where ( ” ) = —— = and r is the number of defective items in 
m m!(n — m)! 


the sample and is a sufficient statistic [see Section 8.4]. 

We shall restrict ourselves to quadratic loss functions. That is, we 
shall assume that, if the statistician chooses an estimate ae A and the 
true fraction defective is w, the loss L(w, a) is given by \(w)(a — w)? 
where A(w) > 0 for all w. 

Let £ = (€(0), &(1), -+ +, ¿(M)) be an a priori probability distribution 
on Q. If £ is binomial, i.e., if 


(6.4.25) &(R) = () e®(1 — MR 


for all R = 0, 1, ---, M and 0 < 0 < 1, then the machine producing 
the items from which the lots are formed is said to be in statistical con- 
trol. Let g(r |0) be the probability that r defective items will be found 
in a random.sample selected from the lot when the machine is in sta- 
tistical control. Then, since M — R > N — r, we have 


M—N+r /M — R\ (R M 

6.4.26) g(r|0) = ra- o= /( ) 

¢ s| ) p2 aad | et ( ) N 
Setting R* = R — r and M* = M — N and simplifying, we get 


(6.4.27) grlo) > pa (i 6) ac C ) rd E 


R*=0 


a ra — on- 


so that g(r | 8) is also binomial. 


i] 
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For any & and any estimate a, let 7,(a) be the a posteriori risk [see 
Definition 3.6.5] for the quadratic risk function. Then 


M-N+r R 2 
E MR/M) (a z =) a(r | PJER) 
R=r 
(6.4.28)  7,(a) = M—N+r 


= alr | R)E(R) 


which, when expanded, is a quadratic function of a and attains a mini- 
mum at 


M—N+r R 
p> ap R/C | R)E(R) 
(6.4.29) as i 
E A(R/M)g(r | BER) 
R=r 


Thus the Bayes estimating procedure is to choose the function d with 
values d(r) given by the right-hand side of equation (6.4.29). 

As illustrations, consider the case where Alo) = 1 for all w and (R) 
is given by (6.4.25). Then a simple computation shows that 


N 
(6.4.30) d(r) = (1 - z tt â 


where ô = r/N is the usual estimate of w. : 
Example f Minimax Estimate of œ When the Loss Is Quadratic. 


Of considerable interest is the problem of finding a loss function Llo, a) 
for which the minimax estimate d is the commonly employed estimate 
ô = r/N. We shall show that 4 is minimax for the quadratic loss func- 
tion with A(w) = 1/o(1 — w) and that nature’s least favorable distri- 
bution is given by (R) = 1/(M + 1) for all R. ‘ae 

We see from the assumed form of L(w, a) that the only estimating 
Procedure d(r) that yields finite risk for all œ is one for which d(0) = 0 
and d(N) = 1, unless, of course, the states w = 0 and w = 1 are con- 
Sidered impossible. We also note that L(0, 0) and L(1, 1) are unde- 
fined. For reasons that will become apparent shortly, we set 


M-N 
(6.4.31) L0, 0) = LG, ) = Fo- 
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If the statistician employs ô as an estimate of w, then his risk p(w, ô) 
for w Æ 0 or 1 is given by 


ZG i Gr) 
w(t a) Ca) 


os M-N 


o(l— w) N(M -=1) 


Thus, in view of (6.4.31) the estimate ô yields the constant risk 
(M — N)/N(M — 1) for all w. 


It remains to be shown that 4 is Bayes against the a priori distribu- 
tion (R) = 1/(M + 1) for all R. Now by (6.4.29) the Bayes esti- 


mate is given by 
an a M te — 3 w 
ror M — R\N =r r 


MNT Mm? M — R\ /R 
EEN 
Ror R(M — R) \N -r r 


MATE (M — R — 1)!R! 
1 ra (M -—R-N+r)(R-r)! la 
UAR M-RE- Mb 
R= (M-R-N+r)(R-r)! 
We now use the fact that 


E [Te M 
BS, aan ee ‘es a(1 — 6)M—R ag = 1 


to derive the identity 


x Ga Ta + A) Ta + RG +M- R) 
r= \R/T(a)T(@)  re+8+M) ` 


(6.4.32) p(w, 6) = 


(6.4.33) d(r) = 


X Tha + R)T(6 +M — R) _ TTP) (a + B + M) 
Roo RM — R)! i MIT(a + 6) 


or 


X (e +R- 1)(86 +M -— R- 1)! TTB) + B + M) 
a R\M — R)! MT(e + 8) 
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Replacing R by R — r and M by M — N, we obtain 
(6.4.34) as (@atR—-r-)IG@+M-N-R+r-1)! 
ae R-)M-N-R+7)! 
= T(a)P(6)T(a + 8 + M — N) 
(M — N)IP(a + 6) 
Substituting a = r + 1, 8 = N — r, in identity (6.4.34), we obtain 
Te + DPW — rM + 1) 
= (M-NI +1) 
Similarly, putting a = r, 8 = N — r in (6.4.34), we get 
TOEN — NEM) 


(M — N)'IT(N) 

Applying the above expression for a and b to (6.4.33), we obtain 
(6.4.35) ae eee 
MN N 


Thus ô is Bayes against the rectangular distribution and hence is a 
minimax estimate for the given quadratic risk function. 


PROBLEMS 


6.4.1. We are given two coins and are informed that one of them is biased, i.e., 


has a known probability p > 1/2 of falling “head,” and the other is unbiased. The 
problem is to decide which of the two coins is biased on the basis of N tosses of each 
coin. Assuming a constant loss w due to a wrong decision, determine the minimax 
Procedure for thi lem. 

6.4.2, ie in Problem 6.4.1 the possibility of deciding that the 
Tesult of the experiment was inconclusive and assume that the loss due to this deci- 
sion is v with w > 2v. Give a minimax procedure in this case. 

6.4.3. Give two geometric interpretations of condition 6.4.22 in Example 4, Sec- 
tion 6.4. Generalize this condition to the case A = (1, 2, 3, 4). TTNA 

6.4.4. Show that in Example 6, Section 6.4, nature’s least favorable distribution 
E(B) = 1/(M + 1) is equivalent to the following. Nature first selected R with the 
binomial probability (6.4.25) and then selected 0 with a rectangular distribution over 
the interval (0, 1). 

6.4.5. Suppose that in Example 6, Section 6.4, we take 


M-N M-N 
L(0, 0) < -=n L0, 1) < Nar- 


N(M — 1) 
Show that in this case ô is still a minimax estimate of w, but that nature has no least, 


favorable a priori distribution. 
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6.4.6. Find the minimax estimate of w in Example 6, Section 6.4, if L(w, a) = 
(w — a)?. Hint. Show first that there exist constants a and £ such that 


Elav + B — u)? = 6 
Next show that the estimate d = aa + £ is Bayes against 


b; 
ER) -f i r)a — gua Le + 9) con (1 — 9) do 


for an appropriate choice of a and b. 

6.4.7. Let G = (Q, D, p) be a game with © = (1, 2, -+-, h) and a loss function L. 
Let @ be the class of Bayes procedures for G. Consider a class of procedures @’, 
each of which is Bayes against § = (1/h, ---, 1/h) for one of the class of loss functions 


£G, a) = NL(j, a) 
n 
with A; > 0 for all j and D2 A = 1. Show that G = @'. 
j=1 


6.4.8. Let Gy = (Qy, Dy, py) be a fixed sample-size game based on N observa- 
tions. Let the cost of taking N observations be given by c(N). Show how to obtain 
the optimal N and dy for any a priori distribution ¢ if the total risk is given by 
c(N) + pnl, dy), dn £ Dy. 

6.4.9. Let Gu = (Q, D, pu) be a fixed sample-size game in which Q = (1, 2), A 
(1, 2), L(1, 1) = L(2, 2) = 0, L(1, 2) = 1, L(2, 1) =u>0. Let S* be the convex 
hull of the set S of attainable risk vectors in 2-space for the case u = 1. (a) How 
is the lower boundary of S* related to the Neyman-Pearson theory of testing a sim- 
ple hypothesis against a simple alternative? (b) Show that this lower boundary (ex- 
cluding those points lying on the axes) can be generated by considering the minimax 
solutions of G, for all values of u between 0 and œ. (c) Sketch the lower boundary 
of S* for the case Z = (a1, 22, z3), 


piles) = piles) = 1/2, piles) = 0; — pale) = p(z) = 1/2, pole) = 0 


CHAPTER 7 


Fixed Sample-Size Games 
with Finite A 


7.1. Introduction 


_ This chapter will be devoted to the study of a class of fixed sample- 
size statistical games in which the number of possible actions that the 
statistician can take is finite. (Games in which the number of possible 
actions is finite are commonly referred to as multidecision games.) 
More specifically, we shall consider a sample space Z = (Z, 9, p) and 
games G = (Q, D, p) in which Q is a parameter set determining a class 
of probability distributions Ce on Z, the class of actions A is finite, 
ie.,'A = (1, 2, +++, k), the class D consists of decision functions d 
mapping Z into A, the risk function p is given by Definition 3.5.3, and 
the statistician performs a fixed sample-size experiment, resulting in a 
Point ze Z on the basis of which he chooses an action employing a de- 
cision function de D. 

The number of clements in 2 may be finite, denumerable, or non- 
denumerable. The special case where @ is finite has already been 
treated in Chapter 6. Whether 2 is finite or not, as always, the proofs 
of theorems will be carried out under the assumption that for each of 
its elements p, is zero except on & countable set of points 21, 22, * ++ in 
Z. Also, if mixed strategies enter in a proof, it will be assumed that 
they are zero except on a countable number of elements in the respec- 
tive spaces Q and D. In many instances, the reader will have no diffi- 
culty in supplying the corresponding proofs in case Z is an n space and 
Pw is a density, or nature’s mixed strategy £ is a density defined on an 
2 space, or both are densities. This will usually involve simply replac- 
ing a summation sign by an integral sign. 

In case of mixed strategies for the statistician, the very nature of 
the space D will preclude the considerations of anything but mixtures 
on a countable set of elements of D. However, 
for finite A this is the most general type of randomization that the sta- 


tistician need consider. 


it can be shown that 
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7.2. On the Equivalence of Two Methods of Randomization 


In Section 3.6 [see Definition 3.6.6] it was stated that the statistician 
has at his disposal two methods of randomization. He can either select 
an 7 from a class H of randomized strategies, i.e., mix a sequence dy, 
də, ++- of decision functions in the proportions \j, A2, ++- with A; > 0 
and 2); = 1, or he can choose an element ye where ¢ is defined on 
AXZ, o(a|z) >0 and D> ¢(a| z) = 1, and, for every outcome z of 

aed 


the experiment, he takes action a with probability g(a|z). We shall 
now prove that these two methods of randomization are equivalent in 
case A is finite [see also Theorem 8.3.1]. 


Theorem 7.2.1. Let G = (Q, D, p) be a fixed sample-size multi- 
decision game (i.e., a game in which A is finite), and let T = (=, H, p) 
be its mixed extension. Consider another game T* = (=, ®, p) where 
each element yg of # has the property described above. Then T* is 
equivalent to T in the sense that, for every n e H, there exists an ele- 
ment ge, and, conversely, such that p(w, n) = p(w, o) for all we Q 
and hence for all ¢ e Z. 


Proof. Since by assumption A = (1, ---, k) every element de D 
can be considered as a partition of the sample space Z into k mutually 
k 


exclusive sets S1, ---, S, with U S; = Z, such that, if z e S; action ¢ is 


i=l 
taken [see Section 3.5]. Hence, the value of the risk function p can be 
written as 


(1) olo, d) = È Lo) 2 pela) = È E Lowll apelo) 


where Wil 2) = T if me wand wila = = 0 otherwise and L;(w) = 

L(w, i) is the value of the loss function given by Definition 3.5.2. Con- 

sider a mixture of a denumerable number of elements of D, say dı, d2, 

Then to each d; there corresponds a function y; on A X Z of the 

type considered. From mixing dı, də, -++ in the proportions 1, As, 
+; àj = 0 and 2); = 1; we get a procedure ņ with 


(2) p(w, n) = 5 £ È Lilli | ape | w) 


j=1 i=l z 
- EE mle) [Ewela] 


where the interchange of summation is justified by the fact that the 
series is absolutely convergent. 
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Now, for a general ¢ the risk is given by 


k 
(3) plo ) = È È Lilet | pE | o) 
z i=l 
Thus from (2) and (3) we see that any 7 £ H determines a ọ, namely 
(9 Drwilé | 2) = eG | 2) 
j=l 


which satisfies the condition of the theorem. 
We must show now that given a p we can find a probability sequence 
Ai Ao, +++ and a sequence {¥;} which satisfy (4) for each ¿and z. Choose 


R ey 
n g k 


Fix z and define Yn(i | 2) and On(¢ | 2) inductively as follows: 


alil 2) = elil 2) 
6*;(z) = max [4 (1 |2), +++, Ak |2] = A(t | 2) 


where 7, is the smallest integer ¢ (@ = 1, 2, ++, k) for which 6*1(2) = 


bli | 2). , 
hala =1, hël =0 for i#ù 
Having defined y(i | z) and 8;(i | 2) for j < n, let 
n—l 
(5) balila) = gild- > rwil| 2) 
j=l 
(6) oa(2) = max [PCl | 2), +++» OC | D] = alin | 2) 
where 7, is the smallest integer i @=1,2, °°) k) for which 6*,(z) = 
On(i | z). 
(7) Vain |2) = 1, hëls 0 for tA 


We shall now show that 


E Anali | 2) = | 2), 7=1,2,-°:,% 
n=l 


Since 


o k 
E dwvalé | 2) =1= eil?) 


k 
tet i=l 


i 


m 
NA 


n= 
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it will suffice to prove that 
E Awvalt | 2) < eG 2), i=1,2 -= 
n 


However, to prove this inequality, it is enough to show that 0,(z | z) is 
non-negative for all n, as can be seen from (5). We prove this by in- 
duction. We know that oli | 2) is non-negative. We shall assume 
0;(% | z) > 0 for j < n and prove that 0,44(¢ | z)>0. Now 


ae = k —1\"?71 
8) XE wail = Day=1-( ) 
i=l j=1 ju k 
Also, summing (5) with respect to z yields 
k k k n-1 
(9) bs Onli | z2) = 2, et | 2) — a Da Aw, (t | z) 
ful i=1 i=l j=l 
k n—1 
=1- Z Lwil) 
i=l j=1 


Thus, from (8) and (9) we have 
k k — A 
(10) ¥ nti 2) = (=) 
i=l 


By the induction hypothesis, 0,(i | z) > 0 for all 7. Hence, in view of 
(10) we have 


— n=l 
(11) 6*n(z) = max [0,(1| 2), +, An(k | 2] > -(=) =n 


Now consider 


(12) On41(¢ | 2) 


Il 


eila) — Dawilz 
j=1 
n=l 


e(é| 2) — E agil 2) — Arpali | 2) 
j=1 


= nli | 2) — Andale | 2) 
Case 1. iin Then, by (7), ¥n(é|z) = 0, and, by (12), On41(¢| 2) 
= 0,(i| 2), so that by the induction hypothesis On41(¢ | 2) > 0. 
Case 2. i=in. Then, by (7), Yn(in|z) = 1, and hence, by (12), 
(6), and (11), 
On4i(t | 2) = Onlin | z) —h = 6*,, (2) M20 
This completes the proof of the theorem. 
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. Corollary 1. If for all w, a risk p(w, 7) is attainable with an n € H, it 
is also attainable by employing a mixture of a denumerable number of 


1/k-—1\" 
elements dı, do, ++- of D in the fixed proportions \j = z =) 


@ =i +). 
Proof. The proof of this corollary follows directly from Theorem 
7.2.1, 


PROBLEMS 


7.2.1. Let Z = (0 < z <1), A = (1, 2) and ofl |2) =z, o(2|2)=1—2 Con- 


struct a sequence {yj} satisfying (4). 

7.2.2. Show that there is no sequence of numbers My, dz, **"s with An > 0, Dàn = 1, 
such that, for every sequence 1, #2, **"» with un > 0, Zun = 1, the series DAn can 
be partitioned into subseries 2’An, D/An, *** With >), = p How is this related 


to corollary 1 of Theorem 7.2.1? 


7.3. Bayes Solutions in Fixed Sample-Size Games with Finite A 


Let G be a fixed sample-size multidecision game, and let T = (Z, H, p) 
be its mixed extension where p is defined on Z X H. In view of Theorem 
7.2.1 we can replace H by & so that T = (Z, ®, p) where p is defined on 
=X. The problem of characterizing the class @ of Bayes solutions 
for G is the problem of finding for each £Z, a y* e® which minimizes 
P(E, o). We prove the following 


Theorem 7.3.1. For any £ = let 
E Lio) | ©) 
O = SF pel oe) 


be the a posteriori risk [see Definition 3.6.4] if z is observed and action 
i is taken, Then a Bayes decision procedure o* is given by °C | 2) 
= 1 for an i for which 7-(i) is the minimum of rz = {r2(1), «++, t2(K)} 
and ọ*(i | z) = 0 for all other a's. 


Proof. For any £ and ¢ the value of the risk function p is given by 


(1) ot, 9) = DL £ Li(o)ple | oeli | 2)E@) 
w z i=l 


Since the series on the rigbt is absolutely convergent, we can inter- 


change the order of summation and obtain 
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k 
(2) PE, 9) = DD Lilo)pl | ogli | 2E) 


w i=l 


= 2 Zreli) E pe | «)e@) 


But the expression on the right is clearly minimized if for every z we 
set (i | z) = 1 for an i for which 


(3) Ta(î) = min [7:(1), ---, T2(k)] 


and (i| 2) = 0 for all other ïs. This proves the theorem. 

Theorem 7.3.1 states that a Bayes solution is obtained by computing 
for each outcome of an experiment the value 7,(z) for each 7 (note that 
only the numerator of the expression for 7.(z) is relevant) and then se- 
lecting that action which corresponds to the smallest coordinate of Te. 
In case there is more than one minimum coordinate in the vector Tx 
it is immaterial which one of these is chosen. 

As an illustration of a Bayes solution in a fixed sample-size multi- 
decision game, consider the following: A manufacturer of cloth is faced 
with the problem of disposing of each bale as it is produced. The 
quality of a bale of cloth is measured by the average number of defects 
per yard of cloth in the bale, where a defect is defined as either a loose 
thread or a transparent spot. 

For each bale, let w stand for the average number of defects per yard 
of cloth. If w were known to the manufacturer he would behave as 
follows: Whenever 0 < w < r he would sell the bale as “first quality” 
for $500.00 (action 1). If r <w < she would sell the bale as “sec- 
onds” for $300.00 (action 2). However, if w > s he would sell the bale 
as “rejects” for $100.00. The user of the cloth eventually knows the 
quality of each bale. He agrees to pay the manufacturer the sale price 
of a bale, provided its quality is as specified or better. However, if the 
quality of a bale is worse than specified, he pays the manufacturer the 
actual value of the bale but imposes a penalty equal to the amount he 
was overcharged. 

The “payoff” matrix in terms of actual money income to the manu- 
facturer is of the following form: 


Quality of 
Bale Action 1 Action 2 Action 3 
O<w<r $500.00 $300.00 $100.00 
r<w<s 100.00 300.00 100.00 
w>s —300.00 —100.00 100.00 


In terms of “loss” or “regret” [see Section 4.3] the payoff matrix is 
given by 
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Quality of 


Bale Action 1 Action 2 Action 3 
O<w<r $ 0.00 $200.00 $400.00 
rsw<s 200.00 0.00 200.00 

w>s 400.00 200.00 0.00 


(It is to be remembered that the loss matrix is obtained from the in- 
come matrix by subtracting for each state of nature the actual income 
from a given action, from the maximum income that could have been 


obtained if that state were known.) 
For every w, the probability of getting x defects per yard, x = 0, 1, 


2, +++ is assumed to be given by the Poisson distribution 
ew” 
p(t |e) =— 
x! 


From past experience, the manufacturer also knows that the a priori 
probability distribution of w is given approximately by the density 


Hu) =e", = O Sw <a) 


The manufacturer is willing to examine N yards of cloth in each bale 
for a decision. What is an optimum (i.e., Bayes) procedure for him to 
follow? For any outcome of the experiment x = Ba 5 ty) he has 
to compute 7,(i) i = 1, 2, 3 and choose the action corresponding to the 
smallest coordinate of tz. Now for a Bayes solution it is immaterial 
whether the income or loss matrix is considered. Hence, employing 
the latter, we have 


200 fer erm dik 400 f eT ND dw 
s 


r 


Tz(1) = 2 
f eT N tD ogm dw 
9 oO 
200 | e70 +99 do + 200 f eT NANA" dan 
re(2) = —— z 
T e-NDM ey 
r . e 
400 f e7 N AD Ey" dw + 200 f e NEDO” dw 
r 
ros 


- 
[roan 


0 


N 
where m = 5 Ti. 
i=l 
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Comparing the coordinates of tz, we see that the manufacturer will 
take action 1 if 


ao r 
(1) f eT NHD) am dw sf e~N+De ym dw 
r 0 
and 
20 r 
(2) f eT NHD do Ji eT N De,m do 
8 0 
He will take action 2 if 
Tr ao 
(3) f eT (N+1) wm dw <f eT NHD) u ym dw 
0 r 
and 
o s 
(4) J e+e m dw <f eT NHD om dw 
s 0 
He will take action 3 if 
r %0 
(5) f eT NH) u m dw <f eT N +1) wm dw 
0 8 
and 
s a 
(6) f eT NHD M do al eam do 
0 s 


Let mı be a value of m (not necessarily integral) for which equality 
holds in (1). Then the inequality will hold for all m < mı. For, let 
ô be any positive number; then 


i) r 
(7) f eT N41) a, ym pi dw -f eT NHD) u jmp dw 

r 0 
If we set r = w, the integral on the left-hand side of (7) will decrease 
while the integral on the right-hand side will increase. On the other 
hand, if m > mı, the inequality will be reversed. It is clear that, for 
m < mı, the inequality in (2) will surely hold. Similarly, let mə be a 
value for which equality holds in (6). Then, by a similar argument, 
we see that the inequality will hold for m > ms in both (5) and (6), 
and these inequalities will be reversed if m < mz. It follows also that, 
for an m between m; and me, the inequalities in (3) and (4) will hold. 
Thus we can characterize the Bayes procedure for this problem as fol- 

N 


lows: Compute m; and mz in the manner indicated. Then, if >> x; < mı, 
isl 
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N N 
take action 1; if m < do z: < ms, take action 2; and, if È, 2; > me, 
i=1 isi 
take action 3. The values of mı and mg may be computed with the 
help of tables of the incomplete gamma function. 
The above is a special case of a general result [Theorem 7.4.3 and 


corollary 1] to be proved below. 


PROBLEMS 


7.3.1. With a given diet A the weight z of 8-month-old hogs is normally distrib- 
uted with mean » = 250 Ib. and a known variance a°. An alternative diet B is sug- 
gested which is expected either to increase the mean weight or leave the mean weight 
unchanged. In either case, diet B is not expected to affect the variance. With diet 
A the cost of feeding for the specified period is $30.00 per hog; with diet B it is $32.40 


per hog. Hogs are worth $16.00 per hundred pounds. r : 
A farmer who ordinarily raises a crop of N hogs decides to experiment with the 
new diet. He selects a sample of n hogs on which to base a decision. Assuming that 


the farmer wishes to maximize his yearly expected profit, show that a class of Bayes 


strategies for this problem is as follows: Let k be a real constant and Z the arithmetic 
average weight of the n hogs. Then adopt diet B if Z > k, and reject diet B if 
<k. 

7.3.2. A buyer measures the quality of lots submitted to him for inspection by 
the proportion p of defective items in each lot. If p < po, he wishes to accept a 
submitted lot. If p > po, he wishes to reject it. If he accepts a lot, his loss is meas- 
ured by a function Z;(p) which is positive for p > Po and zero for p < po. If he 
rejects the lot, his loss is measured by 2 function Lp) which is positive for p< po 
and zero for p > po. Assume that the buyer makes his decision on the basis of N 
observations and that the probability of getting r defective items in a sample of N 


is gi r g e k 
is given by the binomial distribution ee ) pil — p). Show that the class of 


Bayes solutions for this problem is of the form: Accept the lot if r < a; reject the lot 
if r > a, where a is an integer. 


7.4, Fixed Sample-Size Multidecision Games Where @p Is the 
Exponential Class 
statistical literature belong to a 


Many of the distributions studied in l f 
de A probability distribution Pe is 


class known as the exponential class. 
said to be of the exponential type if 
e°* (x) 


(7.4.1) ple |w) = Blo) rle) = TA 


where r is a non-negative function depending only on the real number x 
and w ranges over an interval on the real line. Examples of distribu- 
tions which are of the exponential type are the binomial and Poisson 
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for the discrete case, the normal with known variance, and chi square 
for the case of densities. 


Theorem 7.4.1. Let 21, v2, ---, ty be the values of N numerical 
random variables which are independently and identically distributed 
according to the exponential type. Then the probability distribution 

N 


of the random variable t where t(x) = >> a; is also of the exponential 


i= 


type. 
Proof. Lets = (a, --+, zy), and let 
q(u| w) = Pyle: ex) = u} 
Then 


N 
au |o) Blo) Nee T] r(x) 
i=l 


lz: (z) =u} 


N 


=B D re) 


(z: t(z)=u}] i=1 
= [BAR (u) = Bo(w)e*R(u) 


which completes the proof of the theorem. 
Let X = (X, 9, p) be a sample space where, for each w £ Q and < € X, 


E ï 
(7.4.2) ple | w) = Boe * TI re) 
isi 

i.e. , the coordinates of each x e X are the values of N numerical random 
variables which are independently distributed according to (7.4.1). 
Let 3 = (T, 9, 3) be another sample space in which 7 is the set of real 
narihes determined by the range of the random variable ¢ with ux) = 
> zi By Theorem 7.4.1, Qo consists of elements defined by 


t=1 
alto) = Bo(w)e"R(t) 


Theorem 7.4.2. Let G = (Q, D, p) be any multidecision game in 
which the elements of Eg are given by (7.4.2), and let @ be a class of 
randomized strategies on A X X. Let W be the class of randomized 
strategies y on A X T. Then for each ge® there exists a wew and 
conversely such that p(w, 9) = p(w, Y) for all weQ. 
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Proof. Let y be given. For any we T we define y(t | u) as the con- 
ditional expectation of ọ, given 7 and u; i.e., 


WG | u) = p2 eli | x)p(e| &)/ 2 pæla) 


oy 
where Su = {«: U(x) = u}. Now since Bloe 7 "> 0 we have for 
all ¿ and u 


N 
yilu) = > eli | 2) J r(@)/R@) 


so that y is independent of w for all ue T. Also yll u) > 0 for all z 
k 


and wand >> y(i | u) = 1 for all u. Consequently, y € v. Now 


t=1 


k N 
p(w, 9) = » E Lilet | De II res) 


jel 


k N 
= © Lilo) 2 Bolo) e” 2 eli |2) H r(x) 


i=l 


= > > Li(o)Bo we" (uv | u) 


i=slueT 


k 
= Z LioE| Daule) = el Y) 
i=lueT 
of the first half of the theorem. The con- 


[See Problem 7.4.4.] 
of the more general Theorem 8.3.2 to 


which completes the proof 
verse is left as an exercise. 
Theorem 7.4.2 is a special case 
be proved later. 
_ In view of Theorem 7. 
in which the observations are indepe 
(7.4.1) can be treated as a game in W 
2, p) where X is a set of real numbers. 
type are known as games with a numerical 


Theorem 7.4.3. Let G = (Q, D, p) be a multidecision game with a 
numerical-valued single observation in which the elements of Eg are of 
the exponential type (7.4.1); © ranges over an interval 2 = (a, b), the 
k elements of the action space A can be so labeled as to permit the sub- 
division of the interval 2 into k consecutive subintervals T1, Ig, +++, Ip 


4.2, every fixed sample-size multidecision game 
ndent and distributed according to 
hich the sample space is X = (X, 
Multidecision games of this 
Lvalued single observation. 
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(some of which may be empty and some consisting of a single point) 
with U I; = Q such that 

i Tor all w £ I; Li(w) = min L;(o). 


2. If the integer j lies between s and i, L;(w) < L,(w) for all we I;. 
[See Fig. 11 for illustrations of conditions 1 and 2.] 


Loss functions for monotone procedures 


Figure 11 


Then a complete class @ of decision procedures is a class of monotone 
procedures, i.e., procedures defined by a set of numbers ap < xı < x2 
<+++< xr (with z = —~ and a, = + permitted) such that action 
i is taken only if the outcome of an experiment is a point in the closed 
interval (w;_1, xi). Moreover, if the inequalities of condition 2 on the 
risk function are strict, then, given any decision procedure that is not 
monotone, we can find one that is monotone and that is uniformly bet- 
ter, i.e., that has smaller risk for all w. 


Proof. Before we prove this theorem we require the following 


Lemma 7.4.1. Let @¢Q and c be given, with O0 <c <1. Then 
among all those ¢’s which satisfy 


(a) 0<) <1 foralle 
() DX oO ra) = c 
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there is a unique ¢ that both maximizes LX 9(0)B(w)e“*r(2) for all 
z 


w <6 and minimizes the same sum for w > 6. Furthermore, there 
exists an xz = x, such that we may take 4(x) = 1 for x < 2, and (zx) 
= 0 for z > te 

Proof. We shall construct such a ¢ and then show that it satisfies 


these conditions. 
Let x, be the maximum of the numbers a such that P(x < a) < c. 


Define 4(x) = 1 if x < £e, (£) = 0 for x > te, and if p(x. | 6) > 0 
c — Pola < x) 
p(we |0) 


It is easily seen that 4(x) satisfies conditions a and b. From condition 
b and the fact that p(x. | 6) = 0 implies Po(z < z.) = c, we have 


D E e) — LOrE) = zu lele) — ABOE) 


T <LTe 


(1) O(%e) = 


if p(v-| 0) =0. If p(w. | 0) #0 we include the x, terms in either the 
left-hand side or the right-hand side of (2), depending on whether 
Elt.) — olte) > 0 or (xe) — olz.) <0. With this convention we see 
that each term in the right- as well as left-hand sums of (2) is non- 
negative. We now multiply both sides of (2) by [6() /8(6)\e~?* and 


get 
(3) È [¢(@) - o(x)] (wet r(x) 
2 <ite 
= Łe) -= GBC) Tt p(x) 


T>Te 
(o-Ore+8 < oet according to whether w Ś 0, and 


Since, for x, > 2, € i r c 
for z > a, these inequalities are reversed, it follows that, if we substi- 


tute wx for the exponent in (3), then, if w < @ the left-hand side of (3) 
is increased, while the right-hand side is decreased, but, if œ > 0, the 
reverse is true. This establishes the lemma. 
We now return to the proof of Theorem 7.4.3. Since by defining 
L* (0) = Lilo) — min L;(w) we do not alter either the class © or the 
J 


hypothesis of the theorem, we may assume, without any loss of gen- 
erality, that Lilo) = 0 for all w e I; We may also assume that neither 
I nor I; are empty. For suppose J, were empty. Then Lo(w) < Lilo) 
for w e Iz by assumption 1 and Lo(w) < Llw) for we Is, s > 2 by as- 
sumption 2. Thus Lolo) < Ly(@) for all w, and we can delete action 1 


from consideration. [See Fig. 11.] Similarly for Iz. 
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Let S: = U J;, and let w; (i = 1, 2, ---, k — 1) be the right end 
ji + 


points of Si (Note that S; is an interval since it is the union of adja- 
cent intervals. Also, w; is always in the range of w since I, and I; are 
non-empty.) : 

Now let y be any decision procedure which is not of the type speci- 
fied by the theorem, i.e., monotone. We shall show that we can find a 
monotone decision procedure which is better. We shall assume that 
yl | x) #0. (fy(l | x) = 0, the better procedure which we shall con- 
struct will simply never employ action 1.) Define 


O elds Zyl), i=1,-,k  eOl2)=0 
j=l 
Apply Lemma 7.4.1 to 


(5) È eli | 2)B(O)e*r(x) = c 


with 0 = w;, and obtain a ĝ; and a corresponding point x; for i = 1, 2, 


k= 1. Define (0| x) = 0, (k| z) =1. Observe that ai 


| x) is 
monotone in 7, for 


(6) E elil Blora) 


= D gli | 2)B@e*r(x) 


S E eli + 1| abode rla) < E ea 1| ablo ra) 


The second inequality follows from Lemma 7.4.1 since w; < wy41. 


Hence, from the form of the ĝ’s, gli] z) <ĝli+1 | x) for all x. Ob- 
serve also that ĝ(i | x) is either 0 or 1 for all x with the possible excep- 
tion of the point 2;. 


Consider a new decision procedure ¢ defined by 


@) ala = etla) — g@-1]2) 


and let p(o, Y) be the risk with the given procedure. For any we 
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there is aj such that w £ J;, so that we may writs 


P 
(8) p(w, ¥) = E È Lilo)¥@| 2era) 
jal z 


5 
= E E Lille | x) — gli — 1| D6@)er@) 


iml 2 


ja e 
= E LLleG| 2) — eG — 1| D)B)er(@) 


i=l z 


k 
+ E E Lolila) — eG — 1| ee)e*r@) 


t=j+l z 


since L,(w) = 0 for w £ I; Also by rearrangement of terms we get 


m. 
0) plo Y) = D (Lie) — Li] L of | 2)B(w)e**r(x) 
t=1 z 


k 
+ © Lie) - Lilo) fı -YeG-1| oBer} 
iss z 

The first term on the right-hand side of (9) is obtained from the corre- 
sponding term in (8) by collecting coefficients of elil x) for a fixed x. 
The second term of (9) is obtained from the corresponding term of (8) 
= 1 — (1 — ¢(é| x)) and then collecting coefficients 
Note also that ¢(0 | x) = Oand g(1 | £) = 1. 
e theorem, the quantities in brackets in 
by Lemma 7.4.1 the first sum of (9) is 
x) for each 7 since i <j and hence 
erm by ĝ(i — 1 | x) 


by writing ø(i | 2) 
of 1 — (i | x) for a fixed x. 

Now, by hypothesis 2 of th 
(9) are non-negative. Hence 
minimized term by term by ĝ(t | 
wi <w. The second sum is minimized term by t 


since here w <w;-1. Thus we have 


j-1 
(10) plo, Y) > E Lilo) — Lega) Z ole PEOGA) 
i=l =. 


k 
5T ma-i fı -2Zəi-1l nyateee*r(a)} 
i=j41 z 
Working back by the same process we get that the right-hand side of 
(10) is p(w, @) so that 
(11) p(w, Y) = plo, 2) 
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This proves the theorem since clearly æ is a monotone-decision proce- 
dure of the required type. 

Note that, if the inequalities of condition 2 of the hypotheses of the 
theorem are strict, then the inequality in (11) is also strict for all w, as 
can be seen from (10) and Lemma 7.4.1. 


Corollary 1. Let G be a game satisfying the hypotheses of Theorem 
7.4.3, and let ¢ be any a priori distribution of w. Then, among all de- 
cision procedures which are Bayes against §, there exists one which is 
monotone. If the inequalities of condition 2 of the hypotheses of Theo- 


rem 7.4.3 are strict, where j can equal 7 but not s, then all Bayes solu- 
tions are monotone. 


Proof. The proof follows directly from the definition of a Bayes 
solution. 


Corollary 2. Let G be a two-decision game satisfying hypothesis 1 
of Theorem 7.4.3. Furthermore, assume that @ is an interior point of 
Q = (a, b) and that there exists no interval around @ in which either 
Lilo) or Le(w) is identically zero and that Li) = 0 for w <6 and 
Lalw) = 0 for w > 6. Then for any procedure ¢ which is not monotone 
there exists one and only one monotone procedure which dominates it, 


Proof. Let » be any decision procedure which selects action 1 with 
probability (x) and action 2 with probability 1 — (x), 
w €9, let r(w| 9) be the probability of taking action 1. T 


(1) tllo) = Z o(e)p(a | o) 


and for any 
hen 


In terms of the function 7 the risk is given by 
(2) plo, 9) = Li(w)r(o lo) for w>0 


= Lo(w)(1 — rw | ¢)) for w<oe 


> ze and if p(z,| 0) >0 and 
bility 4(z,) given by (1) Lemma 
e any monotone procedure, and 
unless ¢' = ¢ (and hence v=), 
t that e >c, Then, since for a 
uous function of w, there exists an 
for all wel, rw | $) > Tw | g). 


let 7(6| $) = c’. We shall show that, 
Y does not dominate p. Assume firs 
fixed-decision procedure + is a contin 
interval I containing @ such that, 
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Also, by hypothesis there exists an w = wo in I such that Ly(wo) > 0. 
Consequently, we have 

(3) p(wo, ) = Li (wo) (wo | Y) > Lr (wo)r(wo | 2) = pleo, o) 

and does not dominate y. A similar argument holds in case c’ < c. 
This completes the proof of the corollary. 

Corollary 3. Let G be a game satisfying hypothesis 1 of Theorem 
7.4.3 with A = (1, 2) and with neither Lı(w) nor Lo(w) identically 
zero. Then the monotone procedures form an admissible and hence a 
minimal complete class. 

Proof. In view of Theorem 7.4.3, it is sufficient to show that no 
monotone procedure can dominate another monotone procedure. But 
this follows from the fact that, if we increase ¢, we increase 7(w | 4), 
where 
(1) n(o | 8) = Pal < ze) + $(ee)p (te | w) 
and hence increase p(w, ¢) [see (2), corollary 2] for w > 0; and, if we 
decrease ¢, we increase 1 — r(w | 2), and hence p(w, ¢) for w < 0. 

It can be shown that Theorem 7.4.3 applies to more general distribu- 
tions than were considered here. For example, if the elements in PQ are 
probability distributions or densities of the form 


(i) ple |) = F | og) 
where f(x | w) satisfies the conditions that, if x; > t2, w1 > ws, then 
Gi) fs | orf (a2 | 02) — Fez | er)f(@s | 92) 2 0 


and, if the range of w is an interval, the class of monotone procedures is 


a complete class. Condition (ii) is clearly satisfied by the exponential 


class, since, in this case, 


Flees | eos) flea | 02) — Fez | ofle: | 2) = 

As an illustration of a monotone procedure where the distribution is 
not of the exponential type, consider the following problem: We are 
given a class of rectangular distributions defined over the range (0, w). 
On the basis of N independent observations tı, +*+, ty we wish to test 
the hypothesis that w < 0 against the alternative hypothesis that 
© > 6. The loss function Li(w) > 0 for w > and zero elsewhere. 
The loss function L2(w) > 0 for w < 0 and zero elsewhere. It is de- 
sired to obtain a Bayes procedure against an a priori density (w). Let 


E(w) = alw) + (1 — a)l) 
where à; (w) is a density defined over the interval (0, 0), A2(w) is a den- 


Kereta (e =z) (wiw) 1) 
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sity defined over the interval (8, ©) and 0 < æ < 1. Let x = (4, =+, 
xy) and t(x) = max (21, +++, ty). Then the density of x is given by 
p(z | w) =1/ for t<w 

=0 for t>o 


To obtain the density of ¢, we observe that the cumulative distribu- 
tion function of ¢ is given by 


F(a| w) = Polt < a) = (a/w)® 
Differentiating the above expression with respect to a, we get F’ lalo) 
= N (a/o). Hence 
altl o) = (NNN) for t< w 
=0 for i>w 


The a posteriori risk if we decide that w < 6 is given by [see Theorem 
7.3.1] 


e m=i Í Lalo)alt | orelo) deo 
Ti = 


f loodo 
0 


and the a posteriori risk if we decide that w > 9 is given by 


6 
a Í La(w)a(t |oo) do 
7(2) = 


Cs) 


S lo aa 
0 


Hence the Bayes procedure is to accept the hypothesis that w > 0 if 
0 
ji Niy- Llw) (w) des 
min [t, 6] oY 
h) = — 
N- Ly(e)do(w) 4 


a) 
max [t, 6] WN 


6 


Lolu) (w) 
Í. [tq N dw 


= wo l-a 
a 


< —— 
fi EDn e 
max [t, 6] N dw 


W 


and to accept the hypothesis w < @ if h®) > (1 — a)/a. 
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We shall now show that {t: h() > (1 — a)/a} is an interval (0, ¢*). 
To prove this, it will be sufficient to show that h(¢) is a decreasing func- 
tion of i But this follows since, if é < 6, the denominator of the ex- 


9 
pression for h(¢) is constant and the numerator is of the form f g(w) dw 
t 


where g(w) > 0 which is clearly a decreasing function of t. On the 
other hand, if ¢ > 0, the numerator is zero so that h(t) = 0 and we al- 
ways accept the hypothesis w > 0, as logically we should. 

Fixed sample-size games in which the statistician has only one of 
two possible actions have been considered in Section 6.3 and will again 
be studied in Section 7.7. Although A contains only two elements, 
the space D of pure strategies for the statistician is usually infinite so 
that no general statement can be made about the number of pure strate- 
gies for either player that may be involved in a minimax solution to a 
game of this type. However, in the case of the exponential class the 
minimax solution is often in fact a dichotomy; i.e., nature’s good strategy 
is to mix two points w; and wz in @ and the statistician’s good strategy 
is to employ a likelihood ratio test based on these two points, with pos- 
sible randomization when the class @. is discrete. The results are in- 
corporated in the next theorem. The proof of this theorem is intui- 
tively simple but requires a somewhat lengthy argument in order to 
take care of the various types of behavior of the function p at the end 
points of the interval 2 [see Problem 7.4.2]. 

Theorem 7.4.4. Let G be a two-decision game with a single observa- 
tion in which (I) the elements of @a are of the exponential type; i.e., 
for each w £9, pq is defined by 


p(x |) = B(w)e**r(x) 


(II) Q is an interval (possibly including -+) with left end point a and 
right end point b (III) if a(b) does not belong to Q then either 
(i) a(b) is finite and lim Bw) = 0, or 


w — a(b. 
Gi) a(b) is infinite and er is no smallest (largest) value of x such 
that r(x) > 0. 

(IV) The loss functions L, and Lg are continuous and bounded and 
neither is identically zero. (V) There is a point 6 interior to 2 such that 
Ly(w) = 0 for w < @ and Le(w) = 0 for w 2 6. 

Then there exists a good strategy for the statistician which is mono- 
tone and a corresponding good strategy for nature which is a mixture 
of only two points wı > @ and w2 < 8. 
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Proof. The proof of this theorem depends on the following auxiliary 
propositions: 

A. Let 0 <ec <1, and let v(x) =c. By Lemma 7.4.1 there is a 
dominating monotone procedure ¢, for the given @ such that ¢,(z) = 1 
for x < te $c(%) = 0 for x > a, and, if p(x. | 6) > 0, d-(z-) is given 
by (1) of this lemma. 

For any c, let rw | ¢-) be the probability that we take action 1 if 
we use the monotone procedure ĝe. That is 


rlo | $c) = B(w) 2 e**r(z) + elte) Blo) r(x.) 


Then (wo | ĝe) is for each c a monotone non-increasing function of w. 
Moreover, since (0 | ĝe) = c, this result implies that 


(1) lim r(w |.) =0 uniformly in w for w > 0 
e>0 

(2) lim r(w| ¢.) =1 uniformly inw for w <6 
cml 


Proof of A. Let w < we be elements of 9, and let 
v1, w2) = Tlw | $e)(1 = T(we | $c)) = 1 (we | $e) (1 = (wy | @c)) 


Then, from the definition of +(w| £.) and the fact that $,(x) = 1 for 
z < Te, $-(%) = 0 for z > ze, we have 


y(w1, 2) = KD) D ela) — ge(y) ler tor — oh FM Ir(a)r(y) 
=K > Ge(a)(1 — ely) JAT Former 4-2) — 1Ir(a)r(y) 
20 


where K = 6(w1)(w2). Thus r(w: | ¢.) > rlw | 4). 
B. Let c and ¢, be defined asin A. Then 


(3) lim r(w |£.) = ra|) if asg 
Hi =1 if ag¢gQ 

a) lim (| 4.) = 7(b| .) if beg 
=0 if bgg 


Proof of B. We shall prove (3); the proof of (4) is similar. 

Suppose a is finite and aeQ. Then (3) follows from the continuity 
of v as a function of w for a fixed c. [See Problem 7.4.7.] Ifa = —% 
and a £9, then there exists a smallest x = zo such that r(zo) > 0. We 
may assume that to = 0. Then it is clear that £ is continuous at — ©, 
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and consequently (3) holds. Suppose now a¢2. We may assume that 
Z_ > 0. Then, taking any wo interior to 2, we have 


1 — z(o |.) = bo) ( > e“*r(x) + [1 — A) 


Lera) + 1 A) 


T>Tc 


< 6) ( 


for all w < wọ. Now, by conditions III(), blo) > 0 as w > aif ais 
finite. If a= —%, then, by condition III (ii), we may assume there 
exists an x = % Which is negative and such that r(xo) > 0. Hence 
D r(x) > o asw —> —o so that in this case also lim A(w) = 0. 


ore 


z 
This completes the proof of B. 
C. Let p be the risk function from the given procedure ĝe. That is 


p(w, ĉe) = Ii(w)r(o | 8e) = pr(c, o) for w>0 
= Lo(w)(1 — rlu | -)) = p(o) for w <8 
Then 
(5) lim sup pi(c, #) = 0 
c—0 u20 
(6) lim sup pa(c, %) = 0 
c—1 uso 
(7) sup p1 (6, œ) is monotone non-decreasing in ¢ 
CRAJ 
(8) sup p2(c, w) is monotone non-increasing in ¢ 
w<sd 


Proof of C. The results (5) and (6) follow directly from (1) and (2). 
The results (7) and (8) hold because the function r is non-decreasing in 
c for each w, and hence the supremum of p; (i = 1, 2) has the desired 
property. 

D. For each ¢ with 
an w €Q with wz < 0 such that 


0 <c <1, there is an w €Q with w; > 0 and 


(9) sup pi(c, w) = pile, %1) 
(10) sup p2(0, w) = pa(¢, w2) 


Proof of D. Fora fixed c, 7 is a continuous function of w. Hence by 
condition IV the risks p: (i = 1, 2) are also continuous in w. That is, 
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for a fixed c, pı is continuous in w for w > @ and pe is continuous in w 
for w < 0. Also, if b gQ, then by (4) 


lim pi(c, w) = 0 
wood 


Define 
Pile, w) = p, w) for w20, weQ 


=0 for w=b, b¢Q 


Then p’; is continuous on the closed interval (6, b). It therefore has a 
maximum in this interval. Since 


sup pi(c, w) > 0 
w0 


the maximum cannot occur at 0, nor can it occur at b unless b is an ele- 
ment of Q. 

We similarly define p'2(c, w) and show that p's has a maximum in the 
closed interval (a, 6) and this maximum occurs at an interior point in 
this interval if a ¢Q but may occur at a if a e Q. 

E. There exists a cı and cz with 0 < cı < c2 < 1 such that 


max p;(c, w) < max pe(c,w) for c< cy 
w>0 w<o 
max pe(c, w) < max pi(c,w) for c> co 
o<o w>e 
Proof of E. Define 
p*(c) = max pi(c, w), p*2(c) = max pa(c, w) 
o> w<o 


Then, by condition IV, p*,(c) > 0 for i = 1, 2, and by (5) and (8) we 
can find a cı such that for ¢ < cı 


p*2(c) 
2 


max p1(c, w) < < p*2(c) = max pa(c, w) 
w>0 w<o 


Similarly by (6) and (7) we can find a cz such that for ¢ > C2 
p*1(c) 
2 


max po(c, w) < < p*1(c) = max p1(c, w) 
w<e w>e 


F. There exists (i) a value c with cı < o < co where cı and cz are 
defined in Æ, (ii) an w; with «; > 8, (iii) an w with w2 < 6, such that 


p*1(7) = pi(o, %1) = p*2(0) = p2(o, we) =v 
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Proof of F. Both p*;(c) and p*2(c) are continuous functions of ¢ [see 
Problems 7.4.7 and 1.6.3]. Hence we let c vary in the closed interval 


(c1, c2) and apply D and E. 
G. ĝis a good strategy for the statistician, and nature’s good strategy 


is to mix the points w; and wə given by F. 
Proof of G. Suppose the statistician employs the strategy ĝs. Then, 


for any a priori distribution &, 


Blo(w, 6)] = È p2( | oto) + X alo | w)E() 


ose 


< È o*l) + X o*i (o)Eo) =o 


ws 


Thus the statistician, by employing Ês, can guarantee himself a risk 

not exceeding v. On the other hand, suppose nature mixes w1 and we 
in the proportions y and 1 — Y, respectively, where 

Lalwa) Boa) 

Y Fon Ber) + La(w2)B(w2)e 


(w2—w1)z0 


then the best the statistician can do against this strategy is to employ 
procedure @,, as can easily be verified. 

Note that the boundedness condition on the L;(w) can be relaxed 
without impairing the truth of Theorem 7.4.4. It is clearly sufficient 
for example for the L;(w) to satisfy the condition 
lim p2(c, ©) = lim, pi(c, w) = 0 


for all c. 


The reader is referred to Problem 7.4.8 for an illustrative example. 


PROBLEMS 


7.4.1. Let p(z | u) be given by (7.4.1), and let (w) = 1/6(). Then 


1 
(a) Bult) = cee 

a 
© eža) = Bale — Bale? = Fa 


where the symbol log stands here for the logarithm to the base e. 
7.4.2. Let Pa be of the exponential type, and let Q* = {w: w is finite and >) e”7r(z) 
z 


< o; w = +% and there is a largest z such that r(z) > 0; w = — and there is a 
smallest x such that r(x) > 0}. Then Q* is called the natural range of the class Og. 
Characterize the set 2* for the following classes g: (a) binomial, (b) Poisson, (c) the 
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set X consists of all positive and negative integers including zero and 


r(z) = e!7! 


(d) The set X is as in (c) and 
r(z) = e'#!7(1 + 2%) 


(Note that the natural range for Pg is an interval; see Lemma 11.5.1.) 

7.4.3. Give an example of a discrete distribution ps of the exponential type such 
that (a) for some c, 0 < c < 1, po(x = ze) = 0 where ze is given by Lemma 7.4.1, 
(b) for all c, O < c <1, pole = ze) > 0. 

7.4.4. Prove the converse part in Theorem 7.4.2. 

7.4.5. A sample of 10 observations is taken from a binomial distribution with prob- 
ability w of a “success.” It is desired to test the following three alternative hy- 
potheses: 


(1) 0 < w < 04, (2) 0.4 < w < 0.6, (3) 0.6 <w <1 


The loss matrix is summarized in the following table. 


Decisions 
States of Nature 
1 2 3 
0 <w<04 0 10 100 


A decision procedure y of the following form has been suggested: 
v1 | m) = 0.6, y(2|m)=0.3, ¥(3|m)=01 for m=0, 1,2 
Wilm) = 0.2, ¥2|m)=0.6, y(3|m)=0.2 for m= 3, 4, 5, 6, 7 
y(1 | m) =0.1, y(2 | m) = 0.3, (3 | m) =0.6 for m= 8, 9, 10 


where m is the number of successes in the sample. 

Employing Theorem 7.4.3, compute the monotone procedure ¢ which dominates 
y, and compare the graphs of the risk function p for y and Ẹ. 

7.4.6. Let G be a game satisfying the hypothesis of Theorem 7.4.3 with A = (1, 2), 
and let d be a Bayes decision procedure against an a priori distribution ¢. Then, if 
Elow, d)] = 0, d is monotone. 

7.4.7. Prove that the function + in Theorem 7.4.4 is jointly continuous in w and c. 

7.4.8. Consider Problem 7.3.1. Let A stand for the possible average increase in 
weight as a result of feeding the hogs diet B. The expected profit per hog if diet A 
is fed is $[0.16(250) — 30] = $10. If diet B is fed, the expected profit per hog is 
$[0.16(250 + A) — 82.40] = $(0.16A — 7.60). It is equally profitable to feed diet 
A or B if $10 = $(0.16A — 7.60), i.e., if A = 15 Ib. Assume that, if the farmer feeds 
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diet A and A > 15, his “loss” per hog is measured by La(A) = 0.16(A — 15). If he 
feeds diet Band 0 < A < 15, his “loss” per hog is measured by Lg(A) = 0.16(15 — A). 
Prove that if e < (15/t)\/n, where the constant ¢ is approximately 0.75, then a 
minimax solution to this problem is as follows: Nature’s good strategy is to select 
two values of upins = 265 —0.750/4/n and wp = 265 + 0.750/4/n with equal 
probability. The farmer’s good strategy is to adopt diet B if > 265 Ib. and not 
adopt diet B if x < 265 Ib. The value of the game is 0.0272No/+/n (approx.). 


7.5. Minimax Strategies in Fixed Sample-Size Multidecision Games 


The main result of this section is the following: 

Theorem 7.5.1. Every fixed sample-size multidecision game G = 
(Q, D, p) in which Z is countable has a value and the statistician has 
a good strategy. 


To prove this theorem, we shall need three lemmas which will be 
required in the proofs of later theorems also. Without loss of generality 
we shall assume that the elements of Z are the positive integers 1, 2, ---. 


Lemma 7.5.1. The space & contains a sequence {wg} such that, for 
every w £Q, there is a subsequence {om} of {wg} such that 


lim plom, 9) = plo, 9) 


for all pe ®. 
Proof. For any w and @ in Q define a “distance” function 6 by 
us = 5 ; 
O a= ElL- LAE gll — Gl 
i=l 


i=1 


Consider sequences r = (71, 72) ** -) of rational numbers which have 


only a finite number of non-zero terms. Define 

k o 1 f 
(2) Aon = El) rl + Dl PGL @) — riel 
i=1 j=1 


y w £9, there is an r such that 


We first show that for any e > 0 and an 
M such that 1/2” < ¢/3. 


8(w, r) < e To do this, we select an integer 


We also select a sequence r of rational numbers 7, 72, «++ such that 
(8) | Lilo) — ri | < €/8k, i=1,2, =, k 

and 

(4) | pj] ) — tis | < €/, j=1,2 +, M 


and set r; = O forj > M + k. (This clearly can be done since there are 
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only a finite number of inequalities to be satisfied.) Then for such a 
sequence r 

2 l a ge E€ € 1 
=Sat Ł pPGle) Sorat ga = E 


E 
5) lw, T) <-+ 
(E) Be) 3 Sji pears 


Now for every r and n, let wrn be such that 6(w;n, r) < 1/n, if there is 
an element in Q with this property. Clearly the set {wrn} is denumera- 
ble. Now we have seen that, for any we 2, e> 0 and e < 3, there 
exists an r such that 6(@, r) < «/3. Also we can find an n such that 
6/3 < 1/n < 2/3. Then ô(w, r) < 1/n, and hence there exists at least 
one wrn in the denumerable set such that 5(w;n, r) < 1/n. Moreover, 
it is easily verified from (1) and (2) that 


(6) êlo, wrn) < êlo, T) + 6(wrn, 7) 
Therefore 
(7) 5(w, wrn) < 6/3 + 1/n < « 


Thus we have exhibited a denumerable sequence {wrn} such that for 
any wEe€Q we can find a subsequence {w,,;,,} with the property that 
lim 6(, yn) = 0. We shall denote the sequence {wrn} by {wg} and 
any subsequence of it by {wm}. 

Now consider any w £ 2 and any ge, and let {wm} be a subsequence 
such that lim 6(#, wm) = 0. Then 


thew thts @) | At Be 
where 
An = =e Lilei | DPG | #) — 2 2 Lilo)oli | DPG | wm) 
Bn = 2 x L,(w) et | DPG | wm) — x = Li(om)e(¢ | DPG | om) 
< > pC | wm) = | Lilo) — Liem) le |j) < = Lilo) — Lilom) 


since 0 < g(é| j) < 1 for all i andj and Ð p(j| om) = 1 
j=1 


We observe from (1) that ô(w, wm) — 0 if and only if L;(wm) > Li(w) 
for every i and pi | wm) > pj | w) for every j. Moreover, by assump- 
tion, L;(w) is a bounded function. Hence, by Theorem 3.11.5, lim Am 

mara 


= 0. Also, lim Bm = 0. This proves the lemma. 


m> o 


Sec. 7.5 MINIMAX STRATEGIES IN MULTIDECISION GAMES 197 


Lemma 7.5.2. Any sequence fgn} with y, e& has a subsequence 
{vn} such that yn —> Y eð. 

Proof. Since Z is countable and A is finite, A X Z is countable. 
Also, the sequence øn is uniformly bounded, so that the Cantor diag- 
onal method yields a subsequence {Yn} of {øn} converging to a limit y. 
Since a(i | j) > 0 for all i, j and Ž Ynë | j) = 1 for all j, y has the 

7 


same properties, i.e., Y €®. 


Lemma 7.5.3. Let {Y4} be a sequence in ® such that 


iim palili) = ¥G|D 


for each i andj. Then 


lim p(w, Ya) = plw, W) 
qaa 


for all w€ Q. 


Proof. This lemma follow 

We now return to the proof of Theorem 7.5.1. 

Let {w,} be the countable sequence of elements in Q considered in 
Lemma 7.5.1. Consider a game Gr in which nature is restricted to the 
pure strategies w1, w2, ***, war of the sequence fwa}. By corollary 1, 
Theorem 2.3.3, this game has a value var, and nature has a good strategy 
Bap = (Ey, (1), «++, E*(M)). Moreover, the statistician also has a 
good strategy y*;r, for, since Gyr has a value, there exists a sequence 
{ø} such that 


s from Definition 3.6.7 and Theorem 3.11.5, 


lim p(E*sr, 94) = ar 


oe: 
But, by Lemmas 7.5.2 and 7.5.3, there exists a subsequence {Yn} such 
that 
lim Yn = p*m, 54y 
n= o 
and , 
lim p(E*ar, Yn) = PEM, 2 ar) 


n= a 


Hence we have for all ge® 

O olon ptr) < var S eE*an P), @ = 1,2, +++, M) 

Now vr < sup Llo, a) < +9, and, denoting the set of a priori dis- 
Ibir ~ twa) oy = A(é) < sup A(é) = 

tributions over (wi, +**» ©) by =u, Ym ee (6) a (é) 
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vy 41 Since Zm C EM- Hence we may write 


lim vm = v* 
@) me 
By Lemma 7.5.2, {y*sr} contains a convergent subsequence {9*s4;} 
(without loss of generality we take M; < M41); let lim 9*y, = ¢*. 


ive 


Given wm € {wg}, then by (a) we have, for M: > m, plom, o*m;) S 


vm; <v*. From Lemma 7.5.3, lim pm, p*m;) = plom, ¢*); hence 
plom, o*) < v*. For arbitrary w €Q there exists (Lemma 7.5.1) a sub- 
sequence {wm} of {w,} such that p(w, v*) = lim p(wm, y*). There- 


maa 


fore 
(3) p(w, g*) <v* forall wel 


From (8) it is clear that v*¢ < v* and that if G has the value v* then 
y* is a good strategy for the statistician. But from (1) and (2) we see 


that G does have value v* since for given e > 0 an M, can be found such 
that 


A*r = sup inf pl, g) = inf p(f*a, P) > vme > v* — € 
geZped gen 
Since e is arbitrary, A*r > v*, and this together with v*p < v* implies 
that G has value v*. 
Theorem 7.5.1 as well as Lemmas 7.5.1 and 7.5.3 also hold for the 
case where the p(s | w) are densities. 


PROBLEM 


7.5.1. Let @g, consist of a class of normal distributions with mean ph, —% <p < % 
and a fixed variance o;”. Let Eo, consist of a single normal distribution with fixed 
mean ye and fixed variance o3? where oj? < o. The loss from an error of the first 
kind is unity, and the loss from an error of the second kind is u. (See Section 7.7 
for the definition of these terms.) It is desired to test the hypothesis 2 against the 
alternative 9% on the basis of N independent observations x = 


= (x1, 22, +t, DN): 
Show that 
Eo The good strategy for the statistician is to accept the hypothesis 91 whenever 
N 
È — 2)? < k and accept the alternative hypothesis 92 whenever >» (z; — 3)? > k 
= =i 
N 
where k is a constant and = 1/N > 2. 
t=1 


(b) Nature’s good strategy is to select % with a probability g and Q with a prob- 
ability 1 — g. In 9 nature will select « with density (u) where 


E(u) = NE: e7 N(u-—u)?/2(0:?— 01?) 
Qr(a2? — o1?) 


Sec. 7.6 COMPLETENESS OF THE CLASSES Go Qo AND Q 199 


Hint. Show that for any k there exists a value of g for which the proposed statis- 
tician’s strategy is Bayes against that g and &(u), and that there exists a value of 9, 
and hence a value of k, which makes the maximum risk in 2; and % equal. 

Note that, if nature employs the above (4), the distribution of Z will be the same 
for the hypothesis 9; as it is for the alternative hypothesis. This makes Z useless 


for discriminating between 9; and 9z. 


7.6. The Completeness of the Classes Go, Qo, and @ 
Theorem 7.6.1. In any fixed sample-size multidecision game G = 
(Q, D, p) the class Go of extended Bayes strategies and the class @p of 
extended admissible strategies are complete. 


Proof. The truth of this theorem will follow from Theorems 5.6.3 
and 5.6.4 if we can establish that, for each ¢ in 4, the game G, has a 
value and the statistician has in this game a good strategy, where 


Go = Q, D, Po) 
pelo, d) = p(w, d) — plo, ¢) 
But the steps required to prove the latter result are identical with those 
employed in the proof of Theorem 7.5.1. 


Theorem 7.6.2. In any fixed sample-size multidecision game G = 
(Q, D, p) the class @ of admissible strategies is complete. 


Proof. In view of Theorem 5.7.1, we must establish that the game 


G satisfies the following two conditions: 
1. There exists a sequence {wg} in Q such that, for every w €Q, there 


is a subsequence {wm} with the property that 


lim plwm, p) = Plo, ¢) 


m= o 


for all y in ®. 
2. For any sequence of mixed strategies eo, o”, +++ in ® such that 


p(w, eft) < plo, e) 


for all w e Q, there exists a g* such that 
plo, ¢*) < p(w, eo) 


for all q and vw. 
That the game G satisfies condition 1 follows from Lemma 7.5.1. To 


Prove that G satisfies condition 2, let {y'} be a sequence such that 


p(w, 9+?) < p(w, o) 
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Then by Lemmas 7.5.2 and 7.5.3, we can select a subsequence {yo } 
from the sequence {ø} such that 
lim om = ot 


mao 


and such that 
lim p@, o™) = plo, ¢*) 


for allw. Since by assumption {p(w, y)} is a non-decreasing sequence 
of numbers, it must follow that 


p(w, o*) < p(w, eo) 


for all q and w. This completes the proof of the theorem. 


7.7. Tests of Composite Hypotheses 


Often in statistics a fixed sample-size two-decision problem is pre- 
sented in the following manner: The space Q of nature’s pure strategies 
consists of two subspaces 2, and Q2. The statistician wishes to make 
one decision if we, and another decision if weQ2. The decision is 
to be made on the basis of a sample of N observations x = (a1, £o, °°"; 
zy). The hypothesis that w £9, is often called the “null” hypothesis 
while the hypothesis that w € Qs is called the “alternative” hypothesis. 
If Q, consists of a single element, the null hypothesis is called “simple,” 
and, if it consists of more than one element, it is called “composite.” 
A similar definition is given to a “simple” alternative hypothesis and a 
“composite” alternative hypothesis. In this section we shall consider 
the case where at least one of the two hypotheses is composite. 

If we, and the statistician decides on the alternative hypothesis, 
the error is known as “an error of the first kind.” Conversely, if w € 92, 
and the statistician accepts the null hypothesis, the error is known as 
“an error of the second kind.” 

Let ¢ be a randomized statistical-decision procedure such that, if % 
is observed, the null hypothesis is accepted with probability g(x) and 
the alternative hypothesis is accepted with probability 1 — g(x). Fur- 
thermore, let & and 8 be two functions defined by 


(7.7.1) alu | 4) =1— È o(z)p(z| o), we 
(7.7.2) Bole) = E o(z)p(2| o), welis 


Then, for any w € Qı, a(w | g) is the probability of an error of the first 
kind, and, for w € Q2, B(w | æ) is the probability of an error of the second 
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kind, if the procedure ¢ is used. We also define 


(7.7.3) ale) = sup a(o | o) 
(7.7.4) Ble) = sup Blo | o) 


A problem of statistical interest is usually formulated as follows: 
Of all procedures ¢ satisfying the condition e(¢) = a (i.e., &(w| 9) < ao 
for all we) find that procedure y* which minimizes B(y). In this 
section we shall give a solution to this problem employing game-theo- 


retic considerations. 
Let be the class of randomized strategies y and let y be a vector 


function with values y(v) = (a(¢), A(y)) mapping ® into a set S in 
2-space. Unlike the case of a simple hypothesis against a simple alter- 
native, it can be shown that the set S is not necessarily convex. How- 


ever, consider the subset T of S defined by 
(7.7.5) T = {y(v) = (a(y), Bly): ee and aly) + By) <1} 
We shall show that T is a convex set in the positive quadrant and that 
the lower boundary of T belongs to T, at least for the case where X is 
denumerable. Let y(v*) be a point on the lower boundary of T. Then 
clearly, if a(e*) = ao, 

Ble") = min (0) 


gi alp) =ao) 
so that the points on this boundary are of primary interest in the solu- 


tion of the statistical problem posed above. . 
To show that T is convex, we first observe that, for any two points 


Yle) and y(y) and for 0 < ż < 1, we have 
(7.7.6) tyle) + (1 — DYW) = vile + (1 — OY) 


This result follows since for all w 

ta(w| o) + (1 — dally) = ao] te + = Oy) 
So that taking the supremum with respect to w € Q, on both sides of the 
above equation yields 
(7.7.7) tale) + (1 — Daly) = alto + (1 — DY) 
A similar argument holds for the function 8. Let g be the line segment 
joining the point (0, 1) and (1, 0), and let ga be a randomized strategy 
such that ga(2) =a, 0 <a <1. Then by (7.7.1) and (7.7.2) 


alul g) =1—a forall wet; Bolg) =a forall wen 


So that y(ga) = (1 — a, a) isa point on g. Conversely every point on 
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q can be obtained from a gpa for some a. Also, for any y(y) e T, the line 
segment joining y(yv) and y(¢z) lies entirely in T since for this special 
case equality holds in (7.7.7) (with y = ga) for the functions @ as well 
as B. 

Now let yı £ T, y2 £ T where yı = y(¢), Y2 = y(v). We shall show 
that ty: + (1 — t)y2 £ T for all ¢ withO < t < 1. By (7.7.6) we have 


iyi + (1 — dy2 = y(t + (1 — )Y) = y3 


If the equality sign holds in the above expression for all yı, Y2, and t, 
then the convexity of T is established. Assume that the inequality 


Figure 12 


holds for some yı, y2, and ¢. Then yz is a point in the rectangle desig- 
nated by U in Figure 12. We draw a line segment r from y3 through 
the point ty: + (1 — é)y2 intersecting q at a point, say, y(¢a). Then 
since, as we have seen, r C T, it follows that the point fy; + (1 — t)Y2 
belongs to T. This completes the proof of the convexity of T. 

The fact that the lower boundary of T belongs to T can be proved 
without great difficulty for a denumerable space X by means of Lemmas 
7.5.1, 7.5.2, and 7.5.3 and will be left as an exercise for the reader [see 
Problem 7.7.3] since the result is also a consequence of a minimax 
method for obtaining the points on this boundary which we shall now 
present. Reference to Problem 6.4.9, especially parts b and c, will 
help to understand what follows. 
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Consider the game T = (Q, ®, p) where @ and ø are defined as above, 
the space X is denumerable, and the loss function L is defined by 


Lilo) =u for we 
=0 for we) 
Iolo) =w for we 
=0 for wen 


with u > 0 and w > 0. Since the scale is arbitrary, we can, without 
loss of generality, assume that w = 1. The risk function p(w, g) is then 


given by 


p(w, o) = a(o | o), we 
p(w, o) = ubl | o), w ENR 
We thus have 
(7.7.8) sup p(w, ¢) = a(y) 
wed 


sup p(w, 9) = ubl) 


wen 
Let 
(7.7.9) Tulo) = max [a(¢), uB(y)] 
v*, = inf Tu(y) 
pet 
It follows from Theorem 7.5.1 that the game G has a value for all u 
and the statistician has a good strategy, say, “u. Hence, we have 


v*, = Tu(g*u) = max lalo*u), uBlo*u)] 


Theorem 7.7.1. % 
a(y*,) = uplg*u) = v*u 


Proof. Assume the contrary. Then either (a) v*, = u@(y*.) and 
a(o*,) = v*, — e e> 0, or (b) v*u = a(y*) and uß(g*u) = v*, — ô, 
65> 0. We shall prove that case a is impossible. A similar argument 
shows that case b is impossible also. 

Consider a strategy 
í) go = (1 —A)e*u + Mo, 0<A<1 


where J» is the strategy of always accepting the alternative hypothesis. 
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Then 
(2) p(w, go) = (1 — Apelo, v*u) +, weQ) 


= (1 — A)p(o, g*u), w EN 
Therefore 


(38) algo) = (1 — Aalg*u) +A 
= (1 — X)[v*u — J +A = v*u — A[v*u — e — 1] — e 


and 
(4) uB(go) = (1 — A)uB(y*u) = (1 — N)v*u 
By a choice of \ sufficiently small we can make 
(5) algo) < vu uB(¢o) < v*u 
so that 9*, cannot be minimax, which is contrary to the assumption. 


Theorem 7.7.2. For any u, let y*, be a minimax strategy. Then for 
any ¢ such that aly) < a(y*,) we have B(y) > B(y*u). 


Proof. Suppose there exists a ¢ such that 


aly) < a(y*y), Ble) < B(y*u) 
Then 


max [a(y), uB(y)] = Tulo) < Tuly*u) 


Since y*, is minimax it must follow that Tuly) = Tu(¢*,). Thus ¢ 
is also a minimax strategy, and therefore by Theorem 7.7.1 


alg) = uB(y) = a(y*n) = uB(e*x) 
which is a contradiction. Hence B(¢) > B(y*.). 


Theorem 7.7.3. The minimax risk v*, is a monotone non-decreasing 
continuous function of u. 


Proof. The monotonicity of v*, follows from the monotonicity of 
Tulp) for each yg. To prove the continuity of v*,, let u’ = u + Au with 
Au>0. Then 


v*, = inf max [a(y), w’B(y)] 


lA 


max [a(y*,), u’B(y*u)] 


Aw Aw 
max | v*,, (1 + v*, | =(14 EA 
u u 
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Thus 0 < v*y4au — vřu < = v*, which proves that v*, is continuous 

on the right. Taking Aw < 0 and interchanging the role of u and w 

in the above argument shows that v*, is continuous on the left also. 
Theorem 7.7.4. 

(a) vy > 0 as u—0 


(b) vt, 9 A<1 as u= o 


Proof. (a) Consider the strategy I, of always accepting the null 
hypothesis. Then 
p(w, Iı) = 0, wE 
=u, w ER 


Hence 
v*, = inf sup p(w, 9) < sup p(w, Ii) = u 
a G é 


Thus since 0 < v*, < u, it follows that v*, > 0 as u > 0. (b) Con- 
sider a strategy Iz of always accepting the alternative hypothesis. 


Then 
p(w, I2) = 1, weQ 
=0, w E Qo 
Again 
v*, = min sup p(w, ¢) < sup p(w, T2) = 1 
p v v 
Therefore 


lim v*, =A<1 

Theorem 7.7.5. Let ao be given with 0 < œo < A where A is deter- 
mined by Theorem 7.7.4. Then among all strategies ¢ such that a(¢) 
= ap there exists a strategy y* which minimizes blo). 

Proof. For any u, let *u be the corresponding minimax strategy. 
Then, by Theorem 7.7.1, v*u = a(g*u), and, by Theorem 7.7.3, we can 
find a u such that v*(u) = æo. The theorem now follows from Theorem 
7.7.2, 


Theorem 7.7.6. 
< A for all g € £0. 


Let ðo be a class of strategies such that a(y) = Ay 
Then, inf 6(v) > 0. [See Figure w] 
g ebo 


Proof. Assume that inf Ble) =0. Then for every « > 0 we can 
geto 
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find a pe € Po such that B(¢.) < e Take e = A,/2u. Then 

v*u < Tulp) = max [e(y,), uB(y.)] < max [Ay, 41/2] = Ay 
which is impossible since, by Theorem 7.7.4, lim v*, = A > Aj. 


uso 


We remark that Theorem 7.7.6 no longer holds if we remove the con- 
dition A; < A. For in that case we simply choose u such that 1/u < e. 
Then the minimax strategy ¢*, satisfies a(y*,) = v*, < A and B(e*u) 
= (1/u)u*, < A/u < 1/u < e. In fact, we have the following 


Theorem 7.7.7. There exists a decision procedure ø which is pure 


(i.e., gz) = 1 for all z on a set SC X and g(x) = 0 for all z e C(s)) 
such that alẹ) = A, Ble) = 0. 


Proof. Letu = 2” and let y*n be the corresponding minimax strategy. 
Then 


a(o*,) 1 hig A g 1 

n) = — vp < — < — 

d gn 2 on or 

Define Vala) = min [L Z ¢*;(2)] 
j=n 


Then Yalt) > 9*n(z) 


Moreover, ¥n(x) is monotonically decreasing as n increases. Now for 
w ENQ 


alo | va) = D- hapel E- 9*n(e)p(z | o)] 


= alo | g*a) <A 


for all n. + Also for w € Qs 


Blo | Ya) = E dala)p(e| w) < E E oapel w) 


1 


% 
Qn =k 


ao ao 
m I 
<E E olapla|u) = E Bo les) < Ds 
jan z jan Jan 2 
Since y,,(x) is monotone decreasing in n, it approaches a limit, say 
y*(x) for every z asn > ©, Then 


Jim alo | Yn) = a| y”) < a 
for all w e Q, and 


lim B(w | ¥n) = B(w | y») < lim 5 =0 


Sec. 7.7 TESTS OF COMPOSITE HYPOTHESES 207 


for all we. Thus B(¥*) = 0, and, in view of Theorem 7.7.6, a(y*) 
= A. We must now prove that we can find a ¢ which is pure. Let S 
be the set of all ve X such that ¥*(x) > 0, and let (x) = 1 for all 
xes and y(x) = 0 otherwise. Then for w € 9 


a(o|¥*) = D0 — yepe | o) 
=1- x, ¥*(x)p(e | o) — p> ¥*(2)p(c | w) 


C(s) 


1- Droel) -021- X pelo = ale) 


Similarly for w € Q 


Bo |y) = E wepe lo) =0 


z 


x, v*(2)p(# | w) + v*(a)p(2 | o) 


ze ze C(S) 


> Z relo = Be] y) =0 


ll 


Consequently 
alp) = sup a(o | g) < sup alo | ¥*) = a(¥*) = A 
wed) wer 


Ble) = sup Bw | o) = 0 


we de 


which proves the theorem. 


PROBLEMS 


7.7.1. Write an explicit expression for v*, in Problem 7.5.1, compute a sufficient 
number of points on the lower boundary of T [see expression 7.7.5], and sketch the 
whole boundary of this set, for o? = 1, o? = 2, N = 10. ; 

7.7.2, Employing Lemmas 7.5.1, 7.5.2, and 7.5.3, give a direct proof that the 
lower boundary of T belongs to T in case X is denumerable. 

_ 7.7.3. Under 9 a coin falls “head” with probability either 0.3 or 0.7. Under 22 
it falls head with probability either 0.4 or 0.6. Characterize the test procedure based 
on N observations which for a given œ minimizes £. ae 

7.7.4. Let Pg be the set of all distributions on the positive integers, let 2) be any 
Subset of Q, and let Q = Q — 9%. Show that, for any sample size N and any ¢, 


ale) + Bly) > 1. 


CHAPTER 8 


Sufficient Statistics 
and the Invariance Principle 


in Statistical Games 


8.1. Introduction 


A statistical decision is usually made on the basis of the outcome of 
an experiment. The experiment will often contain information that is 
not necessarily relevant to the making of the decision. Thus, for ex- 
ample, in the rocket-firing experiment discussed in Section 3.8, the out- 
come reveals not only the number of defective rockets in the sample but 
also the order in which the defectives occurred. However, the order 
of occurrence of these defectives may, for example, be completely irrel- 
evant to the decision whether to accept or reject the lot of propellent 
powder. 

To discover what information is and what is not relevant in a given 
experiment is an important problem in statistical games. This prob- 
lem may be more generally posed in the following manner: Suppose we 
are dealing with a statistical game T = (Q, $, p) where @ is a class of 
randomized strategies defined on A X Z. Does there exist a partition 
of Z with the property that every possible risk attainable with a com- 
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domized) decision functions is only a subset, and sometimes a very 
small subset of the original space of decision functions. Partitions as 
informative as Z are given by the principle of sufficiency, or what is 
more commonly known as the principle of sufficient statistics. 

A further reduction in the decision space can often be made, using 
the so-called invariance principle. Often, the invariance principle leads 
to simplifications even more striking than those obtained from the 
sufficiency principle, but the guarantees are somewhat weaker. We 
cannot assert that the invariance principle can be used without loss. 
However, weaker statements can be made such as that it can be used 
without enlarging the maximum risk. That is, the invariance prin- 
ciple states that under certain conditions there exists a subset Y of & 
with the property that for every ye there is a ye Y such that 


sup p(w, Y) < sup p(w, ¢) 


Thus, if ọ is minimax, so is y, and, if there is a minimax strategy, there is 
One in YW. Moreover, a minimax strategy in Y turns out often to be 
an admissible strategy. , 
, Loosely speaking, the invariance principle asserts that, if a problem 
1s invariant under a change of coordinates, the solution should also be 
invariant under this change of coordinates. For instance, if one wishes 
to estimate the mean height in a population on the basis of a random 
Sample of N individuals with observed heights 2, ---, xx, the estimate 
Will be a function ¢ of 21 , +++, ty. If the observed heights are measured 
in inches, and for some reason the estimate is wanted in centimeters, 
two plausible estimates in centimeters are ci(tı, «++, xy), obtained 
simply by changing the estimate ¢ to centimeters, and é(cx,, +-+ ‘; ctn), 
obtained by changing the observations to centimeters and using the 
Original estimate ¢ where c = 2.54, the number of centimeters in an 
inch, The invariance principle requires that the estimates should agree, 
Łe., that only those functions ¢ should be considered as estimates which 
Satisfy ct(z,, +++, ty) = t(cry, +--+, ety). Examples of estimates with 
is Property are the arithmetic, geometric, and harmonic means of the 
2s; the median of the a’s, ete. A further application of the principle 
of invariance, or the principle of sufficiency, for that matter, would 
lead us to consider only estimates that are symmetric functions of 2’s, 
thus rejecting estimates such as (xı + tx)/2, or any weighted average 
With unequal weights. me 
‘his chapter will deal with the general problem of characterizing 
tl e sample spaces Z which admit non-trivial partitions that are statis- 
tically equivalent to Z. It will be shown that minimal such partitions 


210 SUFFICIENCY AND THE INVARIANCE PRINCIPLE Ch. 8 


always exist, and a general method for constructing them will be given. 
This problem in a somewhat more general form will be again taken up 
in Chapter 12. 

While the general formulation of the invariance principle, to say 
nothing of the proof, is beyond the scope of this book, a few special but 
important cases of this principle will be proved, and applications to 
sampling from a finite population will be given. We shall return briefly 
again to the problem of invariance in Chapter 11. 


8.2. Partitions of Z Which Are Sufficient on x 


In the next four sections we shall assume that the sample space Z = 
(Z, 9, p) is such that for each ze Z there exists at least one w e Q with 
p(z|) > 0. When Z consists of a denumerable set of points, this 
simply requires that we restrict the sample spaces to points that can 
be considered as realizable through an actual experiment. 


Definition 8.2.1. Let Z = (Z, 2, p) bea sample space. A partition $ 
of Z is said to be sufficient on Z if for every bounded function f defined 


on Z and for every Se, B.(f|S) [see Definition 3.6.1] is independent 
of w for those w £ Q for which Pals) > 0. 


We remark that there always exists a sufficient partition on Z, namely 
the partition formed by the individual points of Z. However, for many 


sample spaces, there exist less trivial sufficient partitions. Examples 
of such Z will be given below, 


Theorem 8.2.1. A necessary and sufficient condition that a partition 
S on a sample space Z = (Z, Q, p) be sufficient is that there exist a fac- 


torization of the form 
@) Pela) = gu(e)r(e) = gle | w)r(2) 


such that for each w the partition $ is a subpartition [see Definition 
3.11.5] of the partition of s,, determined by gu (i.e, gu(z) is constant 
on each S e 8) and such that the function r depends only on z. 


Proof. (a) Necessity: Let $ be a partition of Z which is sufficient on 
&, and let s be a function defined on Z with values in § such that s(z) 


= Sif z e S, where S is an element of S. Now for any bounded function 
f on Z we have by definition 


(1) E.(f) = & Eq(f | 8)P.(S) 


where the summation is taken over those elements S e $ for which P.(S) 
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> 0. Let z be fixed and consider a function f defined by 

SY) =1 if y=z 

Sy) =0 if yz 


Then, for this f, B.(f) = p(z| o), and by (1) we have for all w for which 
Po(s(2)) > 0 


(2) p(z| w) = Po(s(e))Zu(F | s) 
Since 8$ is sufficient on Z, Eo(/| s(2)) is independent of w. Thus setting 
(3) gle | w) = Pu(s(2)), r(2) = E(f | s(2)) 


equation (2) becomes 
9 pelo) = gel orl) 
Moreover, it is clear from (3) that, for a fixed w, S is a subpartition of 
Seu [see, for example, Lemma 8.2.1 below], and that (4) holds trivially 
1n case Pa(s(2)) = 0. 

(b) Sufficiency: Assume that (i) holds and that for each w the partition 
S$ is a subpartition of Sg To show that S is sufficient on Z, let f be 
& bounded function on Z, let 0 be any element of Q, and let Ses be 
Such that P(S) > 0. Now, by definition, 

dX f@ge| Or) 


6 3 E T 
) E(f |8) ADO 
Since $ is a subpartition of Sg it follows that g(z | 0) is a non-zero con- 
Stant for all z eS so that 
È Iere) 


6 7 a i u 
(6) E(f | 8) Ere 
ZES 

Thus § is a sufficient partition on Z. This completes the proof of the 
theorem, 

Definition 8.2.2. Let Z = (Z, 9, p) be a sample space. A random 
Variable ¢ defined on Z is said to be sufficient on Z if the partition $; 
of Z determined by ¢ is sufficient on Z. 


Theorem 8.2.2. A necessary and sufficient condition that a random 
Variable ¢ be sufficient on Z is that for each w e 2 there exists a function 
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qa defined on the range of t and a function r defined on Z and independ- 
ent of w, such that 
(i) D(z | w) = tre) = g(t) | w)r(2) 

Proof. In order to prove this theorem, we require the following 


lemma. 


Lemma 8.2.1. Let u and v be two functions defined on Z, and let 
Su and S, be the corresponding partitions determined by them. Then 
a necessary and sufficient condition that 8, be a subpartition of Sy is 
that there exist a function w such that v = w O u; i.e., 


for all ze Z. v(z) = w(u(z)) 


Proof. (a) Sufficiency: Assume that a function w exists such that 
v=wouw. Lety be fixed, and let 


(1) Uy = fe: wz) = y} 

Then by definition U,eS,. Let w(y) = vo, and let 

(2) Vy = {z: v(2) = vo} 

Then, again by definition, V. v € Sv. Now let z belong to U,. Then for 


that z v(z) = w(u(z)) = wy) = v 


and hence ze V,. Thus U, C Vy and consequently S„ is a subparti- 


tion of S». 
(b) Necessity: Assume S, is a subpartition of S, and for a fixed y 
define U; as in (1). Then, for all ze U,, v(z) = vo where vo is some con- 


stant, since by assumption 8, is a subpartition of $,. Define w(y) = 
v(2) for all ze U,. Then v = w O u which proves the lemma. 

We now return to the proof of Theorem 8.2.2. Assume (i) holds, 
and let $; be the partition determined by the random variable ¢, and 


for any w let gale) = lte) 


Then, by Lemma 8.2.1, S: is a subpartition of Seay 
rem 8.2.1, $ is a sufficient partition on Z, 
statistic. This proves the necessity 
sufficiency of this condition, 
&. Then by Theorem 8.2.1 


and hence, by Theo- 
and therefore ¢ is a sufficient 
of condition (i). To prove the 
assume that $; is a sufficient partition on 


P| u) = galr) = ge | a)r) 


and $; is a subpartition of Sew Then by Lemma 8.2.1 there exists a 
function g, such that Jo = go O t; i.e., 
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gelo) = g.(¢@)) = a) | ») 
which proves the theorem. 


Theorem 8.2.3. Every sufficient partition on Z determines a suffi- 
cient statistic. 


Proof. Let S= {Sq} be a sufficient partition on Z. Define, for ex- 
ample, ¢(z) = @ for all zeS,. Then, by definition, ¢ is a sufficient 
Statistic. 

We note that to a given sufficient partition there can be made to 
correspond an arbitrary number of sufficient statistics so that the no- 
tion of a sufficient partition is a more fundamental notion than that of 
a sufficient statistic. 

The following are two examples of sufficient partitions and sufficient 
Statistics. 

1. An experiment consists of N independent trials, each trial result- 
ing either in an event Æ or an event Ẹ. The probability that Æ occurs 
is w, and the probability that F occurs is 1 — w. The sample space £ = 
(Z, 2, p) where Z contains 2% points z, each of which is a sequence of 
N elements of the type Æ or F, 2 is the closed interval (0, 1). Consider 
the partition $ = (So, Sı, **', Sy) where Sm consists of all points z 
containing exactly m events labeled Æ. It is easily verified that Sm 


contains Si points of Z, $ is a sufficient partition on %, and m is a 
m 


sufficient statistic defined on Z. 

2. Let an experiment result in a point « = (a1, =e ty) where the 
xs are independent observations from a Poisson distribution having a 
Mean value w. Here the sample space X = (X, Q, p) where X consists 
of all points in N space with non-negative integral coordinates and p 


is defined by 


eN gt) 
p(x | w) = N , 


Il zi! 
i=l 


N . . 
Where t(x) = X z; The function ¢ is a sufficient statistic on X, and 


O<w<o 


i=l 


the partition S: determined by ¢ is a sufficient partition on X. 


PROBLEMS 


8.2.1. Let S be a partition of Z. A function f on Z is said to be S-measurable if 
Fe) is constant on each SeS. Prove that a function f is S-measurable if and only if 


1S a subpartition of Sy. 
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8.2.2. In view of the results of Problem 8.2.1, restate Theorem 8.2.1, in terms of 
an S-measurable function. 

8.2.3. A coin with unknown probability w of falling head is tossed repeatedly 
until exactly 10 heads have been obtained. Show that the number of tosses is a 
sufficient statistic on the appropriate sample space. 

8.2.4. License numbers from a random sample of 100 automobiles registered in a 
certain state are observed. Show that the largest license number observed is suffi- 
cient for estimating w, the total number of automobiles registered in the state. 

8.2.5. A die with probabilities œ = (w1, -+-, ws) of showing 1, ---, 6 is rolled N 
times. On what sample space is the number of observed 1’s sufficient? 


8.3. The Principle of Sufficiency 


In this section we shall prove that, if Z = (Z, 9, p) is a sample space 
and § is a sufficient partition of Z, then S is as informative as Z. Be- 
fore we prove this result and make the notion of “as informative” more 
precise, we shall first generalize Theorem 7.2.1 to statistical games in 
which the space A of actions is not restricted to be finite. 

Let Z = (Z, Q, p) be a sample space, A an arbitrary space of actions, 
D a class of decision functions mapping Z into A, and H a class of ran- 
domized strategies; i.e., each element 7 £ H is a mixture of a denumera- 
ble number of elements dı, d2, -+-+ in D in proportions, say, ^i, de, ++, 
respectively. Furthermore, let be a class of randomized strategies 
defined on A X Z [see Definition 3.6.6] such that, for a fixed z, g(a, 2) 
= ola] z) is non-negative, is zero except for a countable number of 


elements a of A and >> g(a | z) = 1 where it is understood that the 
aeA 


summation is taken over a countable set. 


Theorem 8.3.1. Let G = (Q, D, p) be a fixed sample-size game, and 
let T = (Z, H, p) be its mixed extension. Consider another game T* = 
(Z, ®, p) where each element ge® has the property described above. 
Then, (a) for each n e H there exists a ge ® such that p(¢, n) = p(é, p) 
for all £ e Z, and (b) for each y e® and each £ eZ there exists a sequence 
Nm, M = 1, 2, --+ of randomized strategies in H such that 


(1) jim, PCE, mm) = pẹ, o) 


Proof. The proof of part a of this theorem is the same as that given 
in Theorem 7.2.1 since the finiteness of A did not enter into considera- 
tion. To prove part b, we let Z’ = {z1, z2, +++} be the denumerable 
subset of Z such that >> &(w)p(z | o) > 0, and, for each i let az; (j = 1, 


2, +++) be the elements of A (which are assumed to be distinct for that 


i) such that 
È e(ai;| 2) = 1 


I 
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Consider the totality of m-tuples of positive integers (Jı, je, --*, Sm) 
where for each fixed k (k = 1, 2, ---, m) the range of jp is finite or de- 
numerable depending on whether the range of the subscript j in the 
element az; is finite or denumerable. We define 


(2) dinja (21) = 0yo Diiin (22) = Gain ty Ojj in (Cm) = Orin 


and d;,j....j,(2:) is arbitrary for i > m. Now consider the randomized 
strategy nm which selects dj,j»...j, With probability \j,j.-.j, Where 
Nija in = Olaiz | 21) 0 C2r | 22) +++ olamin | 2m) 
By part a of this theorem there exists a ¢ in &, say gm, Which is equiva- 
lent to nm. Consider the element z; £ Z. The randomized strategy 1m 
will select the element aj, when 2 is observed if and only if the pure 
strategy dj,js...j, is selected with jı = n. But the probability that this 
will occur is given by 
E glan | 21) e(aein | 22) +++ PCmin | 2m) = (din | 21) 
dati 
that for a given z; (i < m) the 


for all n. A similar argument shows n 
with probability 9(a.;; | 2) for 


strategy nm will select the element aij 
all j; It must follow therefore that 


enla lz) = e(a|a) for i=1,2, m 


The loss function L is bounded by assumption, so that there exists a 
K with | L@, a) | < K forall we2 andaeA. Now 


| oo, om) — ale 9| = 155 E Lo dpl: | Menla] 2) = ell 
i=l aed 
< È E |Le o |pes| ad] enla| a) - eela l 


t=m+laeA 


<K > E [em(a| 2) + ela | ede |o) < 2K > Pe | «) 
i=m+1 ae A 
Let m be so large that > pa | w 
i=m+1 


Positive quantity. Then 
| p(w, gm) — Pls | <e 

Which in conjunction with the corollary to Theorem 3.11.5 

theorem. 


We make the following remarks 
Unlike Theorem 7.2.1, this theorem 


) < 6/2K where e is an arbitrary 


proves the 


concerning the above theorem: (1) 
does not establish complete equiv- 
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alence between H and @ since it only guarantees that, given a ye, 
we can find an n £ H which results in a Bayes risk which is within e of 
the Bayes risk attainable with ¢ for any arbitrary e > 0. Complete 
equivalence can be established, however, if we admit more general 
types of randomized strategies in both H and than can be considered 
in this book. (2) If Z is finite, say, Z = (zı, 22, -+ +, zm), then complete 
equivalence can be established from the following considerations: Let 
g be given, and let a; (i = 1, 2, ---, M; j = 1, 2, ---) be elements in 
A such that >° g(a;;|z;) = 1. As above, we define 
J 


irja.. jae(2i) = Aix i= 1, 2; very AE 
for all possible sets (j1, jo, ---, jar) of positive integers. We then select 
M 
a strategy dija- jy With probability J] g(a; | 2). 
ii 


We now turn to the proof of the main result of this section. 

Let Z = (Z, 9, p) be a sample space, and let S$ be a partition of Z. 
Let @ be the class of randomized strategies defined on A X Z where A 
is an arbitrary space of actions. For any w €Q and any ge ® let qola | w) 
be the probability that an action ae A will be taken. Then 


(8.3.1) tla | wo) = x ola | 2)p(z| w) 


Thus, for a fixed w, g, is a probability distribution on A. Let Ry be 
the class of functions g, defined on A X Q by (8.3.1) for all yeg. We 
define a class Y of randomized strategies on A X $ in the following 
manner: An element y belongs to Y if y(a | S) is a non-negative real 
number which, for each S eS, is zero except for a denumerable set in 
A, and Dy v(a] 8) = 1. The interpretation of the function y is as 


acA 
follows: If the outcome of the experiment is a point ze 5S, we select 
aeA with probability y(a | S). For any weQ and any yeY, let 
g*y(a | w) be the probability that an action a e A will be taken. Then 


(8.3.2) gyal) = Z E pal Delo) = E vals) E pelo) 
SeSzeS Ses zeS 
= DY v(a|$)P.(s) 
Ses 


Let R*y be the class of functions g*, defined on A X Q by (8.3.2) for 
all y eY. 


Definition 8.3.1. Let Z = (Z, 9, p) be a sample space, let A be an 
arbitrary space of actions, let ® be the class of randomized strategies 
defined on A X Z, let $ be a partition of Z, and let © be the class of 
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randomized strategies on A X $. Then the space Z and the partition 
S are said to be equally informative if Ry = R*y where the elements of 
Qa are defined by (8.3.1) and the elements of ®*,y are defined by (8.3.2). 


We note that by (8.3.1) and (8.3.2) we have 
ep, ¢) = E È Llo, ae(a| ape] o) = = L(w, a)q,(a | ) 


aeAzeZ 
and 


rot = E E E Lo ayv@lapelo) = Z Le, agla] 2) 
aed zes Ses aeA 
Thus, if Z and $ are equally informative, then every risk which is at- 
tainable by knowing the outcome of the experiment z is also attainable 
by only knowing which element S of $ contains the outcome z. 
Theorem 8.3.2. Let Z = (Z, 2, p) bea sample space, and let $ be a 
sufficient partition on Z. Then Z and $ are equally informative. 
Proof. Since by assumption $ is sufficient on <, then, by Theorem 
8.2.1, every element of Po factors; i.e., 


(1) pelo) = ge | «)r@) 
where, for a given Se $, g(z | w) is constant for all zeS. Now let ye, 


and let g,(a | w) be the probability that an ae A will be selected for 
any we. Then by equation 8.3.1 


(2) qe(alo) = È ela | zg | u)r) 
zeZ 


for all we. We now define a class ¥ of randomized strategies y on 
4 X 8 as follows: For each ge we let 
> gla | z)r(2) 
3 = M 
(3) v(a| 8) EO) 
zeS 


for all Ses Note that the denominator in (3) is not zero 
Semon oed. (No agraph of Section 8.2.) 


y the assumption made on Z in the first para 
learly y is a non-negative function on A X $, is zero except on a de- 
numerable set of A, and for a fixed S 


Z E ealare) 


zeSaesA 


~ re 


ees 


(4) E vals) = 1 


aed 
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Now for any we let g*y(a|) be the probability that action ae A 
will be selected if the strategy y is employed. Then by (8.3.2) 


(6) gya] o) = za | 2) x 9 | «)r(@) 


Now, on any S in 8, g(z | w) is constant. Thus from (3) we obtain 
(6) v(a| S) do | w)r(@) = X o(a| dge] e)re) 


and hence 


(7) gyal) = 2 Z ola | z)g(2| w)r(2) = 2 ola | z)g(z| w)r(2) 
Comparing (2) and (7), we conclude that Ry C R*y. Conversely, let 
yet. We define (a | z) = yla | S) for all zeSes. Then clearly 
A*y C Ry, and hence Ry = R*y. 

We remark that, for those S eS for which P,(S) > 0, y defined by 
(3) is in fact the conditional expected value of y, given that z eS. 


PROBLEMS 


8.3.1. A coin with unknown probability w of falling head is tossed five times. If 
at no time during the tosses does the number of tails exceed the number of heads, 
an observer will decide that w > 1/2; otherwise he will decide that w < 1/2. Con- 
struct an equivalent decision function based only on the number of heads obtained. 


8.3.2. Give an example satisfying the hypothesis of Theorem 8.3.1, in which the 
classes H, ® are not strictly equivalent. 


8.3.3. Prove that, if Q is finite, the convergence specified by (1) in Theorem 8.3.1 
is uniform in £. 


8.3.4. Prove the converse of Theorem 8.3.2. That is, show that, if Z = (Z, 9, p), 
S is a partition on Z, and $ and Z are equally informative for any arbitrary A con- 
taining at least 2 elements, then $ is sufficient on Z. 


8.3.5. Prove that, if a partition $ on Z and Z are equally informative for an action 
space A containing 2 elements, then S is sufficient on Z. 


8.4. Minimal Sufficient Partitions 


Theorem 8.3.2 exhibits the importance of sufficient partitions (and 
therefore sufficient statistics) in statistical games and raises the ques- 
tion whether for a given sample space Z = (Z, Q, p) there exists a mini- 


mal partition which is sufficient. Formally, this concept is made pre- 
cise in the following definition: 


Definition 8.4.1. Let Z = (Z, Q, p) be a sample space. A partition $ 
of Z which is sufficient on Z is minimal if any other sufficient partition 


of Z is a subpartition of $. A sufficient statistic ¢ on Z is minimal if the 
partition $; is minimal. 
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We shall now prove that for every sample space in which Z is de- 
numerable there always exists a minimal sufficient partition and there- 
fore a minimal sufficient statistic. In order to prove this result, we 
need the following lemmas. 


Lemma 8.4.1. Let © be any class of partitions on a space Y. Then 
there exists a partition U on Y such that (a) U is a subpartition of every 
S in ©, and (b) if V is any other subpartition of every S in ©, then U 
is a subpartition of u (i.e., U is the coarsest partition that is a subpar- 
tition of every $ eS). 

Proof. For any ye Y and any SeG, let Sy be the set in $ which 
has y as an element, and let 
(1) U, = N Sy 

SeS 
We first prove that the collection of sets U, form a partition of Y. We 
observe that every y e Y belongs to Uy since y belongs to each Sy. It 
remains to be shown that these sets are disjoint. Let ze Uy. Then 
zey for every SeG. Hence S: = S, for every Se G, and therefore 
U, = U,. Now suppose for some x and z in Y, Uz N U+ is not empty, 
and assume U, ~ U;. Let y be a common element of Uz and Ua 
Then U, = Uy and U; = U,, and therefore U, = U; which is a con- 
tradiction. Also, by definition Uy is a subset of S, for every Se S so 


that U is a subpartition of seS. 
partition of every 
Assume now that V is a subpartition of every Se ©. To show that 


Y is a subpartition of u, let V be a non-empty set of V, and let ye V. 
Now, by hypothesis, for every seG there is an SeS such that V CS, 
therefore y eS since ye V, and hence S = Sy. Therefore V C Sy for 
every SeS. Consequently V C Uy by (1) which completes the proof 
of the lemma. 

Lemma 8.4.2. Let Z = (Z, 9, p) be a sample space in which Z is 
denumerable. There exists a sequence w1, #2, *'* in @ such that for 
every ze Z there is an 7 such that pẹ | i) > 0. 

Proof. The proof of this lemma follows from the assumption we 
made about Z in the beginning of Section 8.2. 

Lemma 8.4.3. Let w1, wa, +++ be a sequence in Q specified by Lemma 
8.4.2, and define 

pelo) 


(1) ba =s 


= i) 
2 Pele 


i=l 


220 SUFFICIENCY AND THE INVARIANCE PRINCIPLE Ch. 8 


Let S}, be the partition of Z determined by the function ku, and let © 
be the class of partitions determined by ku for each we Q. Then a par- 
tition $ of Z is sufficient on Z if and only if it is a subpartition of every 
Sk, of ©. 


Proof. Suppose S is sufficient on Z. Then by Theorems 8.2.2 and 
8.2.3 there exists a random variable ¢ such that S; = $ and 


plz | o) = dte) | ore) 
We then have 


hale) — OL) 


z z q(t(z) | «:)r(e) 
i=l 


By Lemma 8.4.2 the denominator in the above expression is positive 
so that r(z) > 0 for all z, and hence 


(2) = Maa = ha(t(z)) 
2z F alte) | w) 


Thus ko = ho O t, and, by Lemma 8.2.1, $ is a subpartition of Sk,- 


Conversely, assume §$ is a subpartition of Sk, for every we2. We 
define 


1 
9 = Dre lod 
Then by (1) i 
p@| w) = ke(2)r(e) 


Let ¢ be a function such that S; = 


8. Then by Lemma 8.2.1 there exists 
a function qu such that 


Kes = Qu OF 
and consequently 


P|) = tre) = ge | w)r(e) 


and, by Theorem 8.2.1, 8$ is sufficient on Z. This completes the proof 
of the lemma. 

We remark that in many statistical problems the sample space Z = 
(Z, 9, p) is such that there exists at least one element 0 eQ such that 
p(z| 6) > 0 for all ze Z. In such cases, all the w; that appear in the 
denominator of the definition of kẹ can be taken to be equal to 0 so 
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that ke is then defined by a likelihood ratio, namely 


p(z| o) 
pelo) 


kolz) = 


and Lemma 8.4.3 can then be stated in terms of this likelihood ratio. 


Theorem 8.4.1. Let Z = (Z, 9, p) be an arbitrary sample space in 
which Z is denumerable. There always exists a minimal sufficient par- 
tition on Z and hence a minimal sufficient statistic. 


Proof. Let © be the class of partitions Sz, on Z. By Lemma 8.4.1 
there exists a partition U such that U is a subpartition of every Sr, in 
©, and, by Lemma 8.4.3, U is sufficient on Z. Also, by Lemma 8.4.1, 
if V is any other subpartition of every Sko (and hence by Lemma 8.4.3 
is a sufficient partition), then VU is a subpartition of ù. That is, U, by 
Definition 8.4.1, is a minimal sufficient partition on Z which proves 
the theorem. 

It is to be pointed out that Lemma 8.4.1 gives a constructive method 
for obtaining the minimal partition from the class of partitions G de- 
termined by kw for each w £ Q. 

As an illustration of minimal partitions and minimal sufficient sta- 
tistics, consider the following two examples. 

Example 1. Let X = (X, 2, p) where X is a su 
®g is a product of exponential distributions; i.e., 


bset of N space and 


N N 


ple loa) = BoR TI re) 


terval (a, b) on the real 


where x = (x dw ranges over an in 
= (a, +++, ty) and w g 
deni |0) > 0 for all X; con- 


line. Here any 9 eQ has the property that p(x 
sequently, we can fix a @ and define ky by 


, pelo) _ peage 
e T r i=l 
e) = relo Leo 
termined by ke is identical with 
N 


t where (2) = >> ti 
i=1 


It is clear that the class of partitions de 


the class of partitions determined by the function 


Thus trivially $, is a subpartition of every Sry and S: is a minimal suff- 
inimal sufficient statistic. 


“lent partition on X, and ¢ is a mi 


222 SUFFICIENCY AND THE INVARIANCE PRINCIPLE Ch. 8 


Example 2. The probability of a coin falling head (H) is ¢; with 
(0 < ¢ < 1), if in the previous trial the outcome was tail CEs and the 
probability of its falling head is ¢2 with (0 < ¢2 < 1) if in the previous 
trial the outcome was head. A coin was tossed, and the outcome was 
tail. An experiment consists of tossing the coin 3 more times, each toss 
being an independent trial. We consider two sample spaces 7; = 
(Z, 21, p1) and % = (Z, 92, p2) where Z consists of 8 points represent- 
ing the outcomes of the 3 tosses, 2; = (0 < $; < 1) X (0 < $x < 1) 
and Q = {(¢1, $2) € N1: $1 = $2 = $}. For any w €Q; and wo E Q2, 
the probability distribution on Z is summarized in the accompanying 
table. Let Sı be the minimal sufficient partition on Z, and $g be the 


Z 


Piz |w) 


palz | we) Si So 
HHH ġġ? 3 (HHH) (HHH) 
HHT diho(1 — ps) (1 — ¢) (HHT) HHT 
HTH or(1 — ġa) $1 — ¢) (HTH) HTH 
THH drpo(1 — ġ1) $1 — 4) (THH) THH 
THT p(l — $1)(1 — $2) (l — ¢)? (ar THT 
HTT p(l — $1)(1 — 2) $l — ¢)? HTT HTT 
TTH p(l — 91)? (1 — ¢)? (TTH) TTH 
TLE (1 — ¢1)8 (1— ¢)3 (TTT) (LTT) 


minimal sufficient partition on 2. Then $, consists of 7 sets and S2 
of 4 sets as is indicated in the table. To show that these are minimal 
partitions we apply the results of Lemma 8.4.2 where we define 


pi(z | «3) 


koj = Sao 
ow pi(z | 0;) 


G = 1, 2;7 = 1, 2) 


where 0; = 0z is the point in Qı with $1 = ġ2 = 1/2. 
Example 2 is a good illustration of how the concept of sufficient par- 


tition and sufficient minimal partition on Z = (Z, 2, p) depends not 
only on Z but also on @g. 


PROBLEMS 


8.4.1. Determine the minimal sufficient 
tangular distributions w = 1, 2, -- 
sample of size N. 

8.4.2. Determine the minimal sufficient partition for the class pq of all discrete 
distributions on the set of positive integers for a sample of size N. 

8.4.3. A necessary and sufficient condition that a partition of Z be sufficient on Z 


is that for any set of the partition and any two distributions in the class ®g the 
likelihood ratio is either constant or undefined. 


partition for the class Eg of discrete rec- 
+, where Poli) = 1/w for i = 1, +++, w, for a 
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8.5. Sufficient Statistics for Densities 


It is beyond the scope of this book to give analogous definitions of 
the notions of sufficient partitions and sufficient statistics in case the 
sample space X = (X, 9, p) is such that X is an N space and @ù isa 
class of densities. Here we shall only remark that, if there exists a 
random variable ¢ such that for each we and each xe X, pla | w) = 
q(t(x) | )g(x) where the function g is independent of w, then ¢ is called 
a sufficient statistic on X. Most of the theorems proved in the previous 
sections are valid for densities—in particular it can be shown that any 
risk attainable by a randomized strategy defined on A X X can also 
be attained by a randomized strategy defined on A X T where T is 
the range of t. 

The following are two examples of sufficient statistics for densities. 

1. Let x = (zı, +--+, tx) be N independent observations from a rec- 
tangular distribution defined over the interval (0, w), and let t(£) = 
max (xı, «++, æy). Then the density of x given w is defined by 


(i) pæ |o) = 1/0" for il£) < w 
=0 for t(z) >w 


Thus ż is a sufficient statistic on X = (X, 9, p) where p is defined by (i). 
2. Let x = (zı, +++, tw) be N independent observations from a nor- 
mal distribution with mean » and variance o2. Then the density p(x | w), 


w = (u, a) is given by 4 ae 
i sca) 15 (8) 


(ii) ple |o) = (z LEA 


2ro. 


N 
N = 
where = = 2 dia; Let t be defined by i(x) = (s, x (a — a°): 


i=1 . se 
Then tisa sufficient statistic on X = (X, 9, p) where p is defined by (ii). 


8.6. Principle of Invariance for Finite Groups 


In this section we will develop the principle of invariance for the 
Special case in which there is a finite group of transformations defined 
on ZX 2X A. To this end we first define formally the notion of a 


group of transformations. 
Definition 8.6.1. Let Æ be a no 
transformations on Æ if 


1. Each element g of G is a one-to-one 
2. For each g in G and h in G, the func 


n-empty set. Then § is a group of 


function from FE onto E. 
tion g O h is in §. 
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3. For each g in G, the function g~! defined by 


t= gy) if g@)=y 
is in G. 
It follows from the above definition that the identity transformation 
I=gog'=g"' ogising. 


Definition 8.6.2. Let Z = (Z, Q, p) be a sample space, A an action 
space, L a loss function defined on 2 X A, let G = (Q, D, p) be a game 
and G a group of transformations defined on Z X Q X A. The group 
G is said to be admissible with respect to the game G if for each g eS 


there exist functions gz, go, ga on Z, 2, and A respectively such that, 
for every ze Z,weQ, and a £ A, 


(1) glz, w, a) = (9z(z), galo), ga(a)) 
(2) Plgz) | galw)) = pe | o) 
(3) La), ga(@)) = Llo, a) 


(Note that, in case ®g is a class of density functions defined on N space, 
condition 2 is to be replaced by 


P(gx(R) | go(w)) = P(R | o) 


where R is an arbitrary N-dimensional interval.) 

The following is an illustration of an admissible group §. Let Z = 
(Z, Q, p) be a sample space in which z = (£1, z2, +++, ty) is a point in 
N space and the coordinates of z are values of normally and independ- 
ently distributed random variables with mean u and standard devia- 
tion ø so that w = (u, o), =% < p < œ, 0 < o < œ and 


aal AEE 


The problem is to estimate u. The action space A is the entire real 
line A = (—% < a < %). The loss function L is given by 


where f is an arbitrary function. 
Let 6, ye A, B #0, e = (1, 1, «++, 1), and let 


gz) = Bet ye, gw) = (Bu t7,|Blo), galo) = Ba + Y 
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Consider the group § with elements g defined by 
gë (e, w, a) = (g2°"(2), g (o), ga°"()) 


Then G satisfies condition 1 for admissibility. It is easily verified 
that G also satisfies conditions 2 and 3. 

We observe that in many statistical problems g4 = I. Thus, for ex- 
ample, if in the above illustration, instead of estimating », we wish to 
decide whether u/c = 6; (action a) or u/s = ôz (action a2) where the 
loss function L is defined by 


Liw,a) =0 if aeei G=1,2) 
L(w, a1) = wa if p/o = 82 
L(w, a2) = w2 if u/s = ôi 
then the group § with elements 
g? = (92°, gaf, D) 


where gz°(2) = Bz, ga? = (Bu, Bo) where 6 > 0 is an admissible group 
for this decision problem. 


Definition 8.6.3. Let G be an admissible group and © a class of ran- 
domized strategies. For ge and each ge® we define a function gg 
on A X Z as follows: 


gela |2) = e(ga*(a) | g2*@)) 


Definition 8.6.4. Let G be an admissible group, £ = (Z, @, p) a sample 
Space, and A a space of actions. A randomized strategy ¢ defined on 
A X Z is said to be invariant with respect to G if, for all geS, ze Z, 


and ae A, 
v(ga(a) | g0) = (| 4) 
Theorem 8.6.1. If ge® is an invariant randomized strategy with 
respect to an admissible group g then for all g, 
oe = 
Theorem 8.6.2. Let T = (Q, ®, p) be a game and G an admissible 
group. Then for each ge and each g £ G 


p(w, p) = p(gal), 2s) 
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Proof. Since G is admissible and each g €G is a one-to-one function, 


po, ¥) = È È Le, a)e(a| zp |o) 


zeZaeA 


> >») L(galw), 94(2))¢¢(9(a)| g2(z))P(gz(z) |ga(w)) 


zeZ 


È È Ligal), a'el | ’)p(e' | galo) 


ZeZaea 


p(9a@); 9e) 


which proves the theorem. 


Theorem 8.6.3. 


If ọ is an invariant randomized strategy then for 
any geg 


pga), p) = p(w, ¢) 
Proof. This theorem follows from Theorems 8.6.1 and 8.6.2. 


Theorem 8.6.4. Let G be a finite admissible group, and let © be i 
class of all randomized strategies. Then for every ge® there is a ¢* 
in © which is invariant and is such that 


sup p(w, »*) < sup p(w, ¢) 
wel wen 
Proof. Let M be the number of elements of G. Define 
(1) et = 1 È or 
M heg 


(a) ¢* is invariant. Take any arbitrary ge. Then from Defini- 
tion 8.6.3 


eel) = 3 Deleld = g E euo io) 


1 
“pe 2) el © h)a™(ga(a)) | (g o h)z™ aze) 


= 


= i È Pzor(gala), gz(2)) 


SL 


= 57D, 00, g0) = o*a la) | o0) 
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1 
(0) sup p(w, g*) <- 2 sup p(w, pe) 
wed M geGwel 


1 
T 2 sup egal), ve) 


1 

= DY sup plo, ¢) = sup plo, ¢) 
M gegen wee 

which proves the theorem. 


Corollary 1. If there exists a finite admissible group § and if the 
game has a minimax procedure then it has a minimax invariant pro- 
cedure, 

Theorem 8.6.5. Let G be a finite admissible group. Let g* be an 
invariant randomized strategy which is not admissible. Then there is 


an invariant strategy ¥* which is better than ¢*; i.e., y* is such that, 
for all w £9, p(w, ¥*) < p(w, o*), and, for some 0 € 9, p(6, ¥*) < Alb, o). 


Proof. Since by assumption ø* is not admissible, there exists a 
Veð which is better. We construct a * as in (1) in the proof of the 
Previous theorem. Then 


(1) p(w, ¥*) = a 5 p(w, We) 


eeg 


1 -1 
= m 2.0. (o), y) 


by Theorem 8.6.2. But y is better than ¢*, and hence 
1 = 
“ %) <4 E o) o”) 
) plo, V°) S 55 2P 
So that, by Theorem 8.6.3, 
i ole, y9 < È E plo, o°) = olo, o") 
M geg 


Now, for some 0 £9, p(0, Y) < p0, ¢*) so that if we substitute 0 for 


® in (2) we get 
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a) o0, w) = 37 D TO, v) 


1 =l 
= 77 PO») + ae (6), ¥)] 
Bx. 


< O 99) + E oO) o 


earl 


1 
=n È 00, 9*) = (6, o*) 
ge 


which completes the proof of the theorem. 


Theorem 8.6.6. Let G be a finite admissible group, and let &* be the 
subclass of © of randomized strategies, each of which is invariant under 


G. Then if o*e%* is admissible with respect to the elements of &* 
then it is admissible. 


Proof. Assume that ¢* is not admissible. Then by Theorem 8.6.5 
there exists a y* e@* which is better than ¢*. But this contradicts 
the assumption that ¢* is admissible in the class &*, 

Theorem 8.6.6 states that, in order to determine whether a strategy, 


invariant under a finite group, is admissible, we need only to test it 
against the class of invariant strategies. 


Theorem 8.6.7. Let G = (Q, D, p) be a game satisfying the condi- 
tions: (i) For any a, be A with a = b there exists an w eQ such that 
Llo, a) = L(w, b). (ii) If pe | w) = p(z | 6) for all zeZ, then w = 0. 
Let G be a group which is admissible with respect to G. Then if g and 
h are elements of G such that gz = hz then g = h. 


Proof. We first show that 


gz = hz implies go = hg. Since G is ad- 
missible, we have 


(1) P(9z(2) | galw)) = ple | w) 
and hence 
(2) Pz | gow) = p(gz(2) |o) 


Similarly, for any h €G, 


(3) Ple | ho(w)) = (hz (2) | w) 


But gz = hz and hence gz~! = hz, and, since the index w is unique, 
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we have go = he. It remains to show that g4 = ha. Now by (3) 
Definition 8.6.2, 


(4) LGalw), gala)) = L(w, a) = L(halw), ha(a)) 


for all ag A and we. But go(w) = ha(w) for all w, as we have first 
shown, so that (4) can be written as 

(6) Llo, ga(a)) = Llo, ha(a)) 

for all we. Now suppose ga = ha; then by assumption there exists 
an weQ such that L(w, gala)) = L(, ha(@)) which contradicts (5). 
Hence g4 = ha, and consequently g = h. 

We remark that condition (i) of the theorem implies that the same 
label has been applied to all actions that result in the same loss for 
each state of nature, and condition (ii) simply states that distinct labels 
imply distinct distributions in Pg. Thus, neither condition is restrictive. 


Theorem 8.6.8. Let G = (Q, D, p) be a game satisfying conditions (i) 
and (ii) of Theorem 8.6.7, and let G be a group which is admissible with 
respect to G. Then if Z is finite so is . 


Proof. Let Z = (21, Z2, +++, zar), andletge§. Since g is admissible, 


g(z, w, a) = (gz(2), galo), ga(a)) 
Also, gz is a one-to-one function on Z so that each gz simply permutes 
the subscripts of the elements of Z. Thus there are only a finite num- 
ber of elements g in § with distinct components gz. The theorem now 
follows from Theorem 8.6.7. 


PROBLEMS 


„8.6.1. Show that in estimating the parameter w (the probability of a success) of a 
binomial distribution from N observations tı, -*, zy With L(w, a) an even convex 
function of (w — a), there is a minimax estimate t such that 


Ken +++, ny) = 1-00 = ty oy 1 = aN) 


8.6.2. Treat the problem of the n-faced die of Section 6.4 in terms of the invari- 
ance principle, and characterize the game G and the group § involved. 


8.7, Application of the Invariance Principle to Sampling from a Finite 
Population 


The concept of a statistical game involving a finite population may 
© formulated in the following manner. We are given a sample space 
= (Z, 2’, p’), and either nature or a conscious player performs a fixed 
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sample-size experiment and obtains a value of a random variable which 
is a point x = (a1, £2, ---, tw) in a space X where each 7; is a real 
number or vector with real components. The statistician has to select 
one out of a class A of possible actions in complete or partial ignorance 
of z and w’. He incurs a loss L which is a function of the action ae A 
selected and the point x, but usually not of the underlying w €Q. 
Moreover, for a given a, L(x, a) is assumed to be constant for all per- 
mutations of the coordinates of v. The statistician can obtain partial 
information on x by observing a fixed number of coordinates of 2. 
The problem is, if the cost of observing x; is independent of i, how 
should the statistician select the sample? We shall here show that the 
principle of invariance together with the principle of sufficiency leads 
to the well-known method of simple random sampling. 

Formally we are dealing with the following entities. 

1. A sample space X = (X, Q, p) where each element xe X has M 
coordinates (£1, %2, ---, £a) and each z; is a real number or a vector. 
It is assumed that if ve X so does any arbitrary permutation of its 
coordinates. The space Q is the same as X; i.e., 


paļo)=1 if t=, peļo)=0 if rw 


2. An arbitrary space A of actions. 


3. A set S of sampling plans consisting of all subsets s = (j1, jo, ++, 


jn) of the integers (1, 2, +++, M) where N < M and no two jis are 
equal. If se is selected, then the N coordinates (ji, jo, +++, jy) of £ 
are observed. 


4. A decision function space S X D with elements (s, d) where d de- 


pends only on the coordinates specified by s and d(x) e A for all v. 


5. A class of random procedures & where each ge® is defined on 
SXAXX, is non-negative, 


is zero except ; 

andiissach that pt on a denumerable set in A, 
(a) 2 Z els, a| t)=1 forall 
(b) 2 els, a| x) is independent of z for all seS 
(c) For each ses 
named in s. seS, o(s, a| 2) depends only on the coordinates of # 

6. A grow ith el T = n 
ich, Ta P § with elements g” defined on A X X where A = SX Å, 


(E, w, 4) = (gx"(z), ga (w), g3"(@)) 


Sec. 8.7 SAMPLING FROM A FINITE POPULATION 231 


where T is a permutation of the integers 1, 2, ---, M, and, if s = (zı, 


aie, tm), 
gx" (x) = (Xr, Xre, +++, Xrm) 


and, if ā = (s, a) = (ju Ja, +++ Jus 0), 
ga? (@ = (Ti, Tjo ++, Tiar, ® = Gs" (s), a) 


where, if k is an integer and T carries k into j, then by Tk is meant the 
integer j. Since there exists a one-to-one correspondence between X 
and Q, the function gq” transforms the elements of 2 in the same man- 
ner as gx” transforms the elements of X. 

7. The loss function L is defined on 2 X A and does not depend on 
s and is constant for all permutations of the coordinates of w = 2. 
We shall write 

Llo, a) = £(w, a) = £l, (s, a)) 

We observe that the group G is admissible [see Definition 8.6.2] for 
by definition g? (2, w, ae (gx" (x), 9a" (0), ga’ (@). Moreover, by the 
definition of £ 

Llo, a) = lga" (w), ga” @) 


= 1 if ga (o) = gx" d the 
Also by definition p(gx" (2) | ga”(w)) = 1 if ga’ (o) gx” (x) an 
latter implies œ = x and hence p(x | w) = 1. A similar argument shows 
that if p(gx7 (2) | ga” (w)) = 0 then ple |o) = 0. Consequently 


p(gx? (2) | ga” (w)) = pe |o) 
Also ¢ is finite. Hence by Theorem 8.6.4 for each vy eð there exists an 
invariant procedure g* such that 
A sup p(w, e*) S siy p(w, 9) 
Now by Definition 8.6.4 

r 

(a) o*((s, a) | x) = o*(ga(s, a) | gx” @)) = e*[(gs" (8), a) | 9x" @)N 
and hence 


©) E sala) = D els"), 0 | gx” Œ) 
acA i aeA 


of (b) are independent of x and rep- 


No i s 
w by definition of ¢ both sides Se ae will ead s PENATI 


Tesent the probability, say 7(s), th 
€nce we have 


(c) t(s) = a(gs” (s)) 
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Summing both sides of (c) over all permutations T and remembering 
that >> x(s) = 1, we get 


M'n(s) = (M — N)! 
or 
a(s) = (M — N)!/M! 

We note that invariance so far considers each subset of the integers 
in order. Although we could have defined the group § so as to elimi- 
nate order, it is preferable to accomplish this by employing the prin- 
ciple of sufficiency since more is guaranteed by it. For this purpose 
we need only to show that the probability distribution of the observed 
sample factors. 

Let u = (Uy, 9, q) be a sample space in which each u = (uy, =, 


uy) £ Uy consists of N coordinates of some ze X and where each ele- 
ment qu € Qo is given by 


0 atu = E Puli s y= ty E 


ji jy M! 
and is the probability that a sample will result in a point u such that 
Sh = Ta = Wy ++, Wjy = Tiy = Uv. Note that Py(w;, = ui, 
= uy) is either 1 or 0. 
We shall say that ue Un is equivalent to w/e U 
mutation T of the integers 1 to N 4 


tt Win 


x if, for some per- 


Be ca 
U1 = Uri, +++, WN = upy 


Consider a collection of sets $ in Uy with each set in $ consisting of all 
on equivalent to a given point. It is easy to show that $ is a par- 
ition. 


In order to show that 
au | w) = gfu | w)r(u) 


it is sufficient to demonstrate th i i 
at q(u is, for a t 
over each set of the partition S 7 | nie ai err 


bel i ' 
some permutation T stone to the same set in $. Then for 
(2) gus +++, wy | w) 


= pur, -- *, UTN | w) 


—N)! 
see Polo = uri, (M — N) 


tr Wjy = UTN 
2 SIN T? M! 


ret jy Politi = ty o sup et 
ji: jy M! 


ll 


Il 


a Why = Un) 
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where r = T—!. Since the summation extends over all values of jı * 


+++3£ jy, equation 2 can also be written as 


(3) qu), «++, wy | w) 
) (M-N)! 


M! 


; -Pali = Yay E r Cie = UN, 
djri: FIN 


ll 


a(t, +++, uv |o) 


which establishes the result. 

p The above discussion shows that the invariance principle in con- 
junction with the sufficiency principle leads to a strategy which selects 
each set s of N distinct integers from 1 to M (without regard to order) 


with probability 


® ie) 


which is the strategy of simple random sampling. 


8.8. A Special Case of the Invariance Principle with an Infinite Group 


The illustration of the invariance principle involving an infinite group 
t of testing a composite hy- 


which we shall treat in this section is tha i 
Pothesis against a composite alternative, with special application to 
the Student ¢ test, and it will be simpler to formulate the problem in 
terms of densities than in terms of discrete distributions. We consider 
a pair of densities p;(t, s), i = l 2 72 <t <% 0 < s < %, so that 


P:(t, s) > 0, foe s) di ds = 1, and suppose 
according to pı or ps, and that the statistician observes (t, u) where 
u = cs and c is an unknown positive constant. The problem is to de- 
cide whether the true distribution is pı OT P2 and a decision procedure 
1s a function ọ with values g(t | £, u), where o(é | t, u) is the probability 
of deciding that p; is the true distribution when (t, u) is observed. We 
Measure the loss by the probabilities of errors, so that 


pli, c; p) = fa | £ cs)p:(t, $) dt ds 


that a pair (t, s) is chosen 


Now, if we change the units in which w is measured, say, U = kv, k > 0, 


the decision function ø, expressed in terms of t, v is ult, v) = gilt, ke). 

ut the problem in terms of £ v is precisely the same as that in terms 
Of t, u, since v = (c/k)s and (c/k) is, like ¢, an unknown positive con- 
Stant, The invariance principle, applied to this case, requires that, 
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palt v) = ot, v); ie, gëlt ko) = gl] t, v), so that, since k is 
arbitrary, the invariance principle requires that, for a fixed 7, » should 
be a function of t only. = 

A property of tests based on ¢ only is given by 


Theorem 8.8.1. For any y and any e > 0 there is a »* based on ¢ only 
such that r 
a:(¢*) < aile) F i=1,2 
where r 
a;(¢) = sup p(?, c; ¢) 


Proof. Write, for any M > 1, 


M dc 
p(t, c; 9) — 


uM c i J 1 i Ge =| ava 
u(i, ¢) = oa = [rss Flog aa a ids 


J im e 


1 sM de 
sd i(t, } t, c) — 
fr ( 9 l log M 3/M al | o) c | 


so that, since fm(i, p) is a weighted average of p(i, c; o), fuli, ¢) < 
ail). We consider the decision function 


alo : i (lt ue 
4|t) =———_— = 
i File? 5 
We have 
1 M de 
i = | pill, i | t, ec) — | dé 
alpan) fa 3 [r Spel 9 =| ag 
so that 
(1) |ui, #) — ailen) | < frit, ) | (AF, i, 5) | deds 
where 


1 M sM de 
g:(M, t, s) = Lf -f il, —| 
2log M LJim a Iso c 


Since 0 < g(i| t, s) < 1, | gM, t, s) | < 1 and 


f de p de 
sM C sM C 


g(M,t,s) > 0 as M — œ forallt,s 


(M, t, s)| < 
| gM, t s) | TAN 


| log | s | 


so that 
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Inequality 1 yields that 
| far, 9) — ailem) | > 0 
as M — o, so that, for sufficiently large M, 
a:(gar) < fur, p) + e < ailp) + € 


Since each gar depends only on t, the proof is complete. 

The reader will note that we have actually proved a little more than 
is asserted in the theorem: Not only was the maximum risk a;(y) not 
increased by restriction to invariant tests, but also a weighted average 
risk far(é, ¢) was approximable, for large M, by the (constant) risk of 
an invariant test. The particular weighted average used here, with 
density 1/c, is the so-called invariant measure for the multiplicative 
group of positive real numbers, since it has the property 


è de T de 
ge Jp É 
for all positive k, a, b. 
As an application of Theorem 8.8.1, we consider the following prob- 
lem: x = (a, «+, ty) is a sample of N independent observations from 


a normal distribution with parameters w = (u, ¢). Q; consists of all 
points such that 


o] 
f rulda = Ps i=1,2 
0 


-1(=)' 
e 2 o 


where 


py |) = 


oV 2r 


i.e., Q; consists of all w for which the proportion of positive individuals 
is p;, and we wish to test pı against po. If we have a; so that 


f pwo ay = n: 
y 


where wọ = (0, 1), Q; consists of all w = (u, o) such that p/o = — a; 
so that the problem may be regarded as testing u/s = —aı against the 


alternative u/s = —az. The sufficiency principle enables us to restrict 
N 


attention to tests based on (7, s), where s = > (z: — @)?, or equiva- 
I 


lently, to tests based on (t, s), where t = @/s. If we write s/o for s, 
the joint density of (¢, s) depends only on y/o; let p:(t, s), ~% < t < &, 
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s > 0, be the joint density of (t, s) for we Q;. If is any decision func- 
tion, we have probabilities of errors 


oti, o, &) = fotili, opili, ) dids 


Thus the problem is exactly that treated in Theorem 8.8.1, and we ob- 
tain that without increasing a;(¢) or a2(v), we may restrict attention 
to tests based on ¢ = Z/s. The famous ¢ test emerges here as a test for 
u/c, rather than as a test for » as it is often described. 


CHAPTER 9 


Sequential Games 


9.1. Introduction 


In Chapter 3 we have given a general description of the structure of 
truncated sequential games for the case where the sequence of possible 
subexperiments to be performed is prescribed in advance. In this chap- 
ter we shall be concerned with the characterization of the class of Bayes 
procedures for truncated sequential games of this type, and we shall 
extend these procedures to the non-truncated case for games with con- 
stant cost and independent and identically distributed observations. 
The main aim will be to develop the theory of sequential-decision pro- 
cedures, Applications of this theory to several classes of problems will 
be given in the subsequent chapter. The reader will do well to refer 
back to this chapter after having become acquainted with the contents 
of Chapter 10. 

In some statistical problems a sequential-sampling plan is given in 
advance. What remains to be determined is the decision procedure to 
be used once experimentation is terminated by this plan. For this and 
other technical reasons it is convenient, as was pointed out in Section 
3.9, to separate the statistician’s strategy into two components, the 
sequential-sampling plan and the terminal-decision procedure. For the 
sake of completeness, we shall summarize the contents of Definition 
3.10.3. 

The elements of a truncated sequential game are: 

(i) A sample space Z = (Z, 9, p’). 

(ii) Vector-valued random variables gi, +++, gy defined on Z. 

Using (i) and (ii), we obtain the new sample space X = (X, Q, p), 
where X = Xı X---X Xw and X; is the range of g; fori = 1, ---, 
N, and each pq in Pg is obtained from a P's in @’, according to Defini- 
tion 3.4.2; i.e., Po is the joint probability distribution of gı, +++, gy 
given w. 

(iii) A space A of terminal actions. In general we place no restric- 


tive assumptions on A; it may be any arbitrary set. 
237 
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(iv) A set D of functions which map J X X into A, where J is the 
set {0, 1, ---, N} and such that, if x, ye X and z; = y; i = 1, 2, ---, 
j, then d(j, x) = d(j, y) for all de D. 

(v) A class S of partitions of X such that 

(a) If Se © then $ = (So, Si, ---, Sy). f 

(b) If z, ye X and t; = y; for i = 1, ---, j, then xe; if and only 
if yeS;. That is, each S; is a cylinder set over K = {reJ: 0 < TSG}: 

The product space Š X D is the space of sequential-decision func- 
tions. A partition $ in © determines a sequential-sampling plan. The 
sets S; of $ are sometimes referred to as “stopping regions.” Since S; 
is a cylinder set, it is always known in a sequence of experiments whether 
or not the observations belong to a stopping region, i.e., whether the 
experiment is to be continued or terminated. 

In addition to the above notions, we also need a cost function and a 
loss function. 


(vi) A non-negative bounded function c defined on J X X such that, 
if z and y are in X and z; = y; for i = 1, +++, j, then 


e(j, x) = celj, y) 


This last restriction on the cost function ¢ amounts to saying that the 
sampling cost of experimentation depends only on the subexperiments 
actually performed. The function c may also be regarded as a set C 
of bounded cost functions c, j= 0, 1, ---, N, where ¢;(x) with j = 0 


is the cost of observing 21, ---, Ti, when x is the point in the sample 
space, and c)(x) = 0. 
(vii) A bounded non-ne 


gative function L defined on 9 X A. L(w, a) 
represents the loss when 


ep Pa is the true distribution of x and the statis- 
tician takes action a. Using (vi) and (vii), the risk to the statistician 
1s given by the function p which is defined as follows [see Definition 
3.10.2]: 


N 


p(w, 8, d) = 4 2 lejla) + Llo, d(j, 2))\po(x) 


j=0 z 


The triple (0, © x D, p) is a truncated sequential game. 


9.2. Bayes Procedures for Sequential Games 

The main results of this sec ) For any arbitrary sampling 
plan S and for any a priori probability distribution £ on Q, there always 
exists, at least to within any e> 0, an optimal terminal-decision pro- 
cedure. (2) For any £, there always exists an optimal sampling plan 
which is briefly characterized as follows: At any stage of experimenta- 


tion are: (1 
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tion, if there exists a continuation that will reduce the risk below the 
present level, we perform an additional subexperiment. If, on the 
other hand, there exists no such continuation, we stop experimenting. 
The above two results are incorporated in Theorems 9.2.1 and 9.2.2 
respectively. 

Let = be the class of a priori distributions over Q, i.e., the class of 
mixed strategies for nature. Then, for any ¢ in =, the expected risk is 


N 


(9.2.1) psd) = 2 > E lce) + LO, a, 2))E@) pola) 


j=0zeSj w 
We introduce the following notation: 


(9.2.2) Pe(x) = D &(w)po(2) 


(9.2.3) @; = the collection of all sets B; such that, for some xe X, 
y £ B; if and only if yi = x; for alli <J. That is, B; is 
a cylinder set over K = {reJ:0 <r <j}. For any 
ae X we also define F;(x) as the set of all points having 
the same first j coordinates as v. Thus, for a fixed J, 

the sets F,(x) are equal for all xe Bj. 
For any bounded function h on 2 X X, let Ej:(h) be the conditional 
expectation of h given tı, T2, ***, 2; when w has distribution ¢, and, 
for fixed w, x has distribution po. For any 2, the value of E;(h) at x 


1 

x 25 E(w) po(y)h(w, y) 
ve F(z) v 

(9.2.4) Ex(h) = 7 Al 


Note that, if j = 0, Go = {X}, so that we can write Eẹ(h) for Eoe(h). 
The function Ej(h) = v(x) is a function of 21, 22, +++, 2; only and 
has the following property: For any bounded function f on X which 


depends only on 21, 22, ***, is 


0.2.5) E E t@)p@Y@h, 2) = LD ke)pale)f@)o(a) 
= Ð, S(@)u(2)Px(2) 


The second equality is immediate, and the first is obtained by substi- 
tuting for v(x) its defining expression above, summing over the ele- 
ments of the partition G;, and using the fact that f is constant over a 
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given set F(x). More specifically, we have 


DD Elope) 


> l X E@)po(y)h(, y) 
= 2 x £(w)p.(x)f(x) (: - X , P;(y) 
2 ; È FYE po(y)h(s, y) 
yeF;(z) 0 
“2% raf > RO 
ye F(z) 
E EIEE, y) 
yeF;(z) 0 
= ay XP ( ÈX Pew) 
X DFO) Mpo(u)h(, y) 
T 2, (z, re) ( > Pea) 


D È EIEE, y) 
Bje@j veB; 0 
= DX ko)p.@)f(a)h(e, 2) 


In case h is a function of x onl. 
equation (9.2.5) 


(9.2.6) 


y, i.e., h(w, x) = g(x), we have from 


X Sa)g(x)Pe(x) = E fx) (Ejelg(x)]) P(x) 


Let f(x) = f(x) be the characteristic function of S;, and let h(w, x) = 
hj(w, £) = (2) + L(w, d(j, x)); then (9.2.1) can be expressed as 


N 
02-7) iaden E lela) + Ba(Lw, dj, 2) Psa) 


Since in most discussions £ will be fixed, we shall sometimes fail to 
exhibit it. Thus, taking for 


h the function whose values are given by 
h(w, £) = L(w, a) for a given a in A, we define 
(9.2.8) 74(@, a) = Ex{L(w, a)] 
(9.2.9) 


T(z) = A 7;(x, a) 
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Theorem 9.2.1. For a fixed £, there is a sequence of terminal-decision 
functions d, such that 


(9.2.10) lim p(E, S, dn) = e*(E, 8) = inf of, 8, d) 
uniformly in $, where 
N 
(9.2.11) "ES=È X [es(x) + 1“ @)P:@) 
j=0 zesj 


Proof. Fix j and x in (9.2.9). Then, for any n we can find a point 
a(j, x) in A depending on j and x such that 


1 
(1) rla, alj, 2) Se) + 


Since for a fixed n we can define da by da(j, £) = alj, £), satisfying 
(1) for each j and z, then for this dn 


1 
(2) plé, S, dn) < e*(E, 8) + = 
for all 8 as can be seen by substituting (1) into (9.2.7). Thus 
1 
(3) inf p(é, S, d) < oG, S, dn) S o*E, 8) +> 
a 


On the other hand, it follows from (9.2.7) that for all d 


(4) p(é, S, d) 2 5 L iye) + r*i(@)Pe(e) = E 8) 


j=0 zes; 
so that 
(5) inf p(E, $, d) > e*(E, $) 
d 


for all n and $, the theorem is proved. 

Theorem 9.2.1 states that, for any arbitrary truncated sequential- 
sampling plan $ and for a given $, there always exists (at least to within 
any e> 0) an optimal terminal-decision procedure. Morebren the 
Bayes risk for a given ¢ and arbitrary S may be taken to be p*(E, 8) (to 
be henceforth designated simply by p(é, §)) since this value can be ap- 
proximated to arbitrary accuracy by an appropriate choice of dn. 

It remains to show that, for a given §, there exists an optimal se- 
quential-sampling plan $*. A constructive proof of this fact is given 


as follows: 


and, since (3) and (5) hold 
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Let £ be fixed, and for any function h(w, xz) we write E;(h) for the 
expression in (9.2.4). We also write 


(9.2.12) U;(x) = ¢;(x) + 7*;(2) 
We define 
(9.2.13) an(z) = Uy(x) 


and, by induction backward, 
(9.2.14) a;(z) = min [U;(z), E;(a;41(z))] 
forj < N. More specifically, 
(9.2.15) en(t) = Ux(2) 
ay—i(z) = min [Uy (2), Ev—1(en(z))] 


@v—2(t) = min [Uy_o(z), Ey_2(ay_;(z))] 


ag = min [Up, E(a,(z))] 
where 
(9.2.16) Uo = inf E[L(w, a)] = int È Le, atl) 
aed aed w 
Observe that if we had performed all N subexperiments and obtained 
z= (a, +++, zy) then a 


(x) would represent the best we can do with 
this x. Also, both Uy_1(x) and Ey_,(Uy(z)) depend only on Ti, V2, 


ttt Sy, and, moreover, Uy_;(z) represents the best we can do with 
N-1 observations, and Ey_4(Uy(2)) represents the average of the 
an additional observation. Thus ay_1(2) 


Wo risks—the risk of stopping with N — 1 
observations and the avera, 


bse ge risk from an additional observation. 
Similarly, an —2() depends only on 2, ***, tyo and stands for 


e risk of stopping with N — 9 observations 


(9.2.17) S% = {z: Ux) > a(t) for i< 


J, U;(2) = e(x)} 
S* = (S%o, Sty, aa S*y) 
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Then $* forms a partition of the sample space X. into disjoint sets. 
That the sets are disjoint follows from the fact that, for any p < q, 
Up(x) > a(x) for all x e S*; while Up(x) = ap(x) for all z e S*p. More- 
over, every point of X belongs to some S;. This follows since, for any 
x we have, by (9.2.18) and (9.2.14), a;(x) = U;(z) for all j, with equality 
holding for j = N. Let r <N be the smallest non-negative integer 
for which the equality sign holds in this expression. Then xe S*,. 
Also the sets S*; (j #0) are clearly cylinder sets over K = {reJ: 


O0<r<j}. R 
Thus $* is a possible sequential-sampling procedure and is character- 
ized as follows: At the jth stage of experimentation j = 0, 1, +++, we 


compare the present risk U;(x) with the average risk a;(x) resulting 
from a continuation if at each future stage we did the best we could 
with the resulting observations. We stop sampling if U;(x) = a;(x) 
and take another observation if U;(x) > a(x). We shall show that 
S* is in fact a Bayes sequential-sampling plan. 


Theorem 9.2.2. The sequential-sampling plan $* defined by (9.2.17) 


is Bayes against £, that is, 


(9.2.18) p(é, S*) = p*(@) = min p(l, 8) 

and furthermore P 

(9.2.19) MORR? > a;(x)Pe(%) = ao 
j=0 ce S*; 


Proof. Let S = (So, Si, **"» Sy) be any arbitrary truncated sequen- 


tial-sampling plan, and let 


(1) T, = Re U Sry U +++ U Sw 
Define 
1 
(2) g(r) = be & a;(x) Px (2) + D a(x) P(x) 
foo reS; zeTr 


= FE ae) + E Pl) 


j=0 re Sj 
so that P 
6) =È È alt) Pe@), g(0) = a 
j=0 72 Sj 


Now the set 7'-41 is defined by + S; dong =O, 1,.* *% 180 that Trp 
depends only on a1, ***) tr Hence letting yr41 represent the charac- 
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teristic function of the set 7,4; and taking y,4, for f and a@r41 for g 
in (9.2.6), we obtain 


(4) = a 41(2)P:(x) = > Elany (2)] Plx) 
ze Try SES 


Thus g(r + 1) can be written 


6) +) =D E ayle)P(a) + Edler a(a)IPeCe) 


j=0 ze Sj 


However, by (9.2.14), a (£) < B,[e41(z)], so that 


© EDZE X aj(e)Pea) + X ax(a)Pe(x) = g(r) 


forr=0, 1, ---, N— 1. Thus 


g(r) is an increasing function of r. 
Again by (9.2.11) and (9.2.12) 


N 
(7) pl, 8) = 2 2 U;(x)Pe(x) 


and in view of (9.2.14) and (3) 


N 

(8) af, 8) = ÈZ ajla)Pila) = g(W) 
j=0 ze Sj 

Hence, for all $ we have 


(9) P, 8) > g(N) > g(0) = 
However, if $ = s* 


a9 
, the inequality, precedin 


for te 7,44, since U,(x) > a,(x) by (9.2 
Ela, 41(x)] by (9.2.14), 
N 


g (6) becomes an equality 


-17), and hence a(x) = 
Thus (6) becomes an equality forr = 0,1, +++, 


— 1. Also inequality 8 becomes an equality since U;(x) = a;(x) for 
xes; by (9.2.17). This completes the proof of the theorem. 
Theorem 9.2.3. The Bayes risk p 


; *(€) for a truncated sequential pro- 
cedure is a concave function of &; 


i.e., for any & and and any a, 


e*(aki + (1 — af) > op*(&) + (1 — a)p* (ta) 


Proof. Let = a; + (1 — a)to. 
(1) 


Then for any $ and d 


PẸ, S, d) = apli, 8, d) + (1 — a@)p(ts, S, d) 
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as can be seen from (9.2.1). But for all S and d 


(2) 
PES, d) >a inf (hs, Sd‘) +a) inf plea, S, d) 
(S, d)e5 XD (S, d)e5 XD 
and hence 
6) >&= inf pE, S, d) > a*l) + (1 — a) (o) 
GdeSxXD 
PROBLEMS 


9.2.1. Let y be a binomial random variable with poly = 1) = w, poly = 0) = 
1 — w, 0 <w <1, and let t() = 1 for all w; i.e., ¢ is rectangular on the interval 
(0, 1). Let Sy be the following sequential-sampling plan. Take independent ob- 
servations on y in sequence, and stop as soon as m > 1 ones have been found, but in 
any case stop after N observations. F: ‘ind the optimal estimate of w if, for any es- 


timate a, 0 <a <1, Llw, a) = 200 — a)”. 
9.2.2. Consider the same situation as in Problem 9.2.1, but, instead of employing 


Sy, construct the optimal sequential procedure $*y for N = 5 if the cost per obser- 
vation is 1 unit. 


9.3. Bayes Sequential Procedures for Constant Cost and Identically and 
Independently Distributed Observations 
cial case in which the random variables 
gi (i = 1, 2, «++, N) are independent [see Definition 3.11.4] with the 
same probability distribution fo and c;(x) = je where c is a positive 
constant which, by a change in scale, can and will be taken as unity. 
We shall show that for this case the optimal sequential procedure 
S*y truncated at N observations can be described by means of a de- 
creasing sequence of regions in the space = of all distributions £ on @ 
and in terms of these regions extend these procedures to the non-trun- 
cated case as well. More specifically, we shall show that: (1) All the 
information contained in a sequence of r subexperiments is summarized 
in the a posteriori probability distribution to be designated by £r, in 
the sense that we can compute £1) knowing only & and 2,41. More- 
over, ¢, is a point in Z. (2) There exists an infinite sequence Zo, Z1, 
Eo, -++ of subsets of = such that Zo = EDZ,D2:,5--: with the 
property that, if, for a given § s*, is the optimal sequential-sampling 
plan truncated at m observations where the observations are to be 
taken one at a time in sequence, then $*, requires taking no observa- 
tions if and only if £€ Zm (3) For any &, the risk of the sequential plan 
S*y approaches a limit as N — ©, and the value of this limit is the 


We shall here consider the spe 
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same as for the sequential plan S* defined by a stopping region =* 
where =* is the common part of Z; (i = 0, 1, 2, ---). ; 

In view of the above, S*y can be characterized in the following man- 
ner. Suppose we have performed j subexperiments and computed £y. 
Then we are in the same situation as we were initially except that £ 
has become £; and the optimal continuation is now the sampling plan 
S*yv_;. Hence we stop experimenting if and only if je Enj With 
the non-truncated sequential-sampling plan $*, we stop experimenting 
if and only if & e 3*. 

The details of the above characterizations and extensions of Bayes 
sampling plans are incorporated in Theorems 9.3.1 and 9.3.3 below. 
Conditions under which the optimal sequential procedure is neces- 
sarily truncated, or is of fixed sample size, are also given in this section. 

Let £ be fixed, and let the symbol é; stand for the a posteriori proba- 
bility of w, given the observations z1, Tə, +++, 23. (Note that the sym- 


bol exhibits only the number and not the values of the observed co- 
ordinates of a point x. This, however, w. 


ill not cause any confusion.) 
Then 
&(w) 2» Poly) 
9.3.1 Ej =p, y= yore 
( ) lw) E pa EOD) Y = (yı » YN) 
y e k(x 


Now, since the z;’s are independent and have the same distribution, 
N 

say folti), we have p(x) = JI Jo(e:). Let « = (u, v) where u = (v1, 
i=1 


ves, 2j) and v = (aj44, + tN), and let V be the set consisting of all 


points». Then since > J] fu(x;) = 1, we have 


zeV i=j+1 
E(w) II Jolai) 
(9.3.2) E(w) = a > 
x I Solas) (6) 


which is clearly independent of N. 
By direct computation we also observe that 


Eloo ljg) 
(9.3.3) to) = EA, 


Thus the a posteriori 


probability distribution & 
probability distributio; 


becomes the a priori 
n for the next observati 


on (or observations). 
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More generally, knowing only £; and zj+1, we can compute the a pos- 
teriori probability for w, given all the z;’s from 1 to j+ 1. This, of 
course, no longer holds if the x's are not independent, since in that case 
additional information, viz., the conditional distribution of 241, given 
Tı, +++, £j, would be required for each w. In addition, since the z;’s 
are identically distributed, the functional form of £41(w) in (9.3.3) is 
the same for all j = 0, 1, ---.. The above can be summarized by say- 
ing that a given sequence of j observations results in a point = & in 
=, and an additional observation transforms ¢ into a different point in 
=, the transformation being given by (9.3.3). We shall designate this 
transformation by T so that equation 9.3.3 becomes £341 = T&;. 


inl 


Let g be a real-valued and bounded function on © (for example, 
g) = inf > L(@, a)é(w)), and for any $ £ let 


h(E) = Eolg(T8] = Blo(Té)] 


where, by definition, 


EUITH = E L g(TEE)foer) 


Consider now the conditional expected value of £341, given t1, t2, ***, 
zj. This can be written as 


(9.3.4) EATEN = E E (TENEO) 


Zj © 


as can be seen from the definition of the symbol #; and the fact that the 


xs are independent. Since the zs are also identically distributed, 


the above equation shows that, if we set t’ = &, then 
Bylg(T&)] = EITEN = hE) 


These remarks are to be borne in mind in the subsequent discussions. 
For any bounded function g defined on 9, the conditional expected 


value of q, given 21, t2, ***, Uj, CAN be written in terms of £; as 


(9.3.5) B,(q) = È lolo) 

In particular, 7;(x, a) [see 9.2.8] in this notation is given by 
(9.3.6) 7;(z, a) = È &(w) Le, a) 
and hence 


(9.3.7) 7*,(x) = inf ri(s, a) = ¥(&) 
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Note that the domain of y is the space =, and the functional form of 
w is the same for all j. Hence, for the problem under consideration, 
the risk p(&, Sy) [see (9.2.11)] for any truncated sequential-sampling 


plan Sy = (So, Si, +-+, Sv) can be written as 
N 

(9.3.8) pÉ, Sy) = 2 Z [I + W(&)1P (2) 
j=0 ze S; 

Theorem 9.3.1. (a) There is a monotonically non-increasing sequence 
of subsets of =, Zo = Z, Zı, Zs, ---, such that, for any N and &, the 
best truncated sequential procedure S*y = (S*py, S*,y, --+, Sty) is 
described by 


(9.3.9) Sty = {2:8 g En for r <j, genn} 

(b) If p*x(£) is the Bayes risk from S*y, then p*y(£) approaches a 
limit, say p*(£), as N > o, 

(c) If 3m = (To, Ti, ++ 


(9.3.10) 


*, Tm—1, Vm) is the procedure 


Tj = {at ¢=* for r<j, te=*} 
Va = C(To UT, U -++ U Tay) = {a:& ¢=* for Fém 
where =* is the common part of the sets =, (r = 0, 1, -+-) then p*(é, 
Im) — p*(é) uniformly in £ as m — ©, 

(d) In (b), the approach of p* nv (E) to p*(€) is uniform in E 


Proof. (a) We define functions ho, hy, +++, on Z as follows: 


(1) hol) = YE) = inf DY E@)L(, a) 


(see (9.3.7)) and by induction on 99> 0, 
(2) hj) = min [¥(&), 1 + Eja (TE) 


We shall later show that hm is in fact the risk function of a sequential 
procedure truncated at m observations, We also define 
(3) 


U; = j + y) 
(4) ann = Uy 
and by induction backward 
(5) 


ajn = min [Uj, Ej(o; 41, y)] 
forj < N. We note that 
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(6) ayy = N +¥(év) = N + (Tn) = N+ ho(Tén—1) 
anı, y = min [N — 1 + Yx) N + Ex—1(hħo(Tx—))] 
= N — 1 + min [¥(éyv-a), 1 + Ev—-a(to(TEv))!] 


= N — 1 + h(x) 
The last expression follows from (2) and the introductory remarks to 
this discussion. 
By a similar argument we have 


(7) ay, y = min [N — 2 + pen), N — 1 + Ev—2(u Ev—1))] 


= N — 2 + min [y(Ev—2), 1 + Ey—o(u(Téy-2))] 
= N — 2 + ho(Ev-2) 
and so forth. In general, we have 
(8) ay; y = min [N — j + v(En—3); 
N-g+tit Ey; (Tin-)) = N — j + hln) 


By Theorem 9.2.2 the optimal sequential procedure is characterized 
by [see 9.2.17] 

(9) S%jy = {atom < U, for r<j, oN = U;} 

Hence, writing r = N — (NV — r) and employing (3) and (8), we have 


(x: hy—r(Er) < WE) for r<j, hyl) = WE) 


(10) S*jy = 
Now consider the sets Z; in = defined by 
(11) z= (eh) = ¥O) 


Then the following identity holds: 


(12) {a:&¢2=y—r for T <j, eE- 

= {x: hy—r(&r) 7 pE) for r<j, hyl) = VE) 
But, by (2), if hy@) # ¥@), then 
sign in the second expression 1n 
This implies that the sequential proc 
expressed as 


(18) S*jy = (2: ér EN 


nj(t) < YE) for all j, so that the # 
(12) can be replaced by the < sign. 
edure $*y defined by (10) can be 


for r<j, Ẹ§ EEn} 


which proves one part of (a). See as , 
To show that the sets Z; are decreasing with j is equivalent to show- 


ing that h,(£) is a decreasing function of j for a fixed £ For suppose 
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= z : = d 
, and let €e=;4; and &¢=;. Then h;p(¢) = YE) an 
ue eerie, h;(é) < = j+1(¢) which contradicts the assump- 
tion that h(E) > hj11(€). Now from (6) 


(14) hj) = an—;, n + (j — N) 
Replacing N by N + 1 andj by j + 1, we get 


(15) hj (ë) = aN—j, N41 + (j — N) 
so that 
(16) M(E) — hjal) = avi, = any, was 


It remains to show that the right side of (16) is non-negative. This is 
equivalent to showing that a,y decreases as N increases. We have 


(17) en, N+ = min [ayy, Ey(ay41, v41)] < ayy 


Applying an induction backward, 
k>j. Then 


(18) 


we assume that œk, m41 < ogy for 


%j, N41 = min [U;, Ejla, v41)] 


< min [U;, Ej(a;41, v)] = ayy 


which completes the proof of part a of Theorem 9.3.1. 


Proof. (b) Since for a 


given £ a possible sequential procedure trun- 
cated at N + 1 is to di 


scard the (NV + 1)st observation and employ 
the procedure 8*y, it must follow that e*v4i(€) < p*y(é). Thus p*n(€) 
for a fixed £ is a decreasing function of N and hence converges to a 
limit, say p*(é). 


Proof. (c) To prove part c of Theorem 9.3.1 we need the following 
two lemmas. 

Lemma 9.3.1. Let S*;y and T; be defined as in (9.3.9) and (9.3.10) 
respectively. That is, 


Stin = (2: ¢Ey_, for r <j, bežna} 
T; = {x &, fue 


for r<j, tem* 
Then . 


Proof. We define 


Kin = (x: t eRy_j}, Kj = (x: ġe z*} 
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Then 
Stin = Ky — U Kn 
r<j 
T; =K;- UK, 
r<j 


Now the sets Z, are monotonically non-increasing, and therefore 


lim Kjy = K;, lim U Ky = UK, 
Noo Noord ra 


which proves the lemma. 


Lemma 9.3.2. Let $y be any sequential-sampling procedure resulting 
in a risk p(¢, $y) for a given &, and let Sm with risk p(E, Sm) be the trun- 
cation of Sy at m < N; i.e., Sm is the procedure that follows Sw up to 
m — 1 observations but always terminates at the mth observation. 


Then 


M 

(19) pl, Sm) < P(E, Sw) [: ee 

where 

(20) M = sup Llo, a) 

Proof. From (9.2.11) we have, since 7*;(x) is non-negative for all j 
and q, 
N 
(21) pl, Sw) = DL G+ TOP) 
j=02eS; 


> FL G+ AAS +m E Pre) 


j=0 zeS; 
where Wm = C(So US: U +++ USm-1). Also 
m—1 
(22) PE 8m) = Do E G+ 7*s@))Pe@) + 2 (m + 7¥ n(x) Pe(a) 
j ze m 


=0 ze Sj 


< pl, Sv) + M 2 Pile) 
But, by (21), Do Pr(z) < 0G $y)/m. Substituting this expression 
ce Wm 


into (22) yields (19). This completes the proof of Lemma 9.3.2. 
Now, for a fixed &, let fim be the characteristic function of the set 
S*;y in (13) for j < m and gmN be the characteristic function of the 
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set C(Son U Siw U ++- U Sm—1, N). Also let Smy be the truncation 
of Sty atm < N. Then, if p*nw(é) is the risk of Spy, 


m—1 
(23) o*m) = D È fine) Ua) Pla) + DO gny (x) Um(2)Pe(2) 
j=0 z z 


Now, by Lemma 9.3.1, f(x). f*;(2) and gmy(z) — g*m(x) as N — o, 

where f*; is the characteristic function of the set T; [see (9.3.10)] for 

j <m and g*» is the characteristic function of the set Vm. Hence, 

by Theorem 3.11.5 and the fact that for a procedure truncated at m, 
m 


P(x) can be taken to be >> (w) I fo(x), 


m—1 


(24) lim nE) = Z ESEU) + E g*n(x)U n(x) P(x) 
ee j=0 z 


= plE, Im) 


Now p*y(&) < p*mn (£) for all N. Hence, applying Lemma 9.3.2, we 
get 


M 
(25) P*n(#) < p*mn (£) < p*n(ë) ( + =) 
and, letting N — , we get 


M M? 
(26) p*(E) < pl, Im) < p*(£) ( + A < p*(E) + — 
m m 


Consequently p(t, 3m) 3 p*(E) as m > œ 


, and, moreover, the con- 
vergence is uniform in £. 


Proof. (d) The truth of part (d) of Theorem 9.3.1 follows from the 
inequalities 
(27) 


M 
P*O < owl) < ole mw) < oO (14 5) 


Theorem 9.3.2. Let p*y 
optimal sequential procedu 


(9.3.11) 


(€) for a given £ be the Bayes risk from the 
re S*y truncated at N. Then 
P*n (ë) = min [¥(8), 1 + E(p*y_4(T8))] 


Proof. From (9.2.19) together with (8) Theorem 9.3.1 with j=N, 
we have 


(1) e*x(é) = aon = hy(t) 
But by (2), Theorem 9.3.1, 
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(2) hy(€) = min [Y(é), 1 + E(hw-1(78))] 
Combining (1) and (2) yields the theorem. 


Corollary. If, for a given ¢, p*(é) is the Bayes risk from the optimal 
non-truncated procedure, then 


(9.3.12) p*(t) = min [¥(), 1 + B*(T8))] 


Proof. This corollary follows from Theorems 9.3.2 and 9.3.1. 

For the type of decision problems under consideration, Theorem 9.3.1 
describes the stopping regions S*;y of the optimal sequential proce- 
dure truncated at N observations in terms of sets of points x e X whose 
first j coordinates result in a posteriori probabilities which are points 
in a specified region Z; of =. This suggests an intrinsic characterization 
of $*y in terms of regions in = without any direct reference to the space 
X. This is often advantageous, especially when = is the fundamental 
probability simplex in h space; i.e., 2 consists only of h elements, and 


h is not large [see Chapter 10]. 

Suppose we are given an a priori probability on & and wish to em- 
ploy a sequential procedure truncated at N observations. Assume that 
the regions =; (j = 0, 1, +++, N) have been determined. (Note that the 
characterization of these regions on a given space = depends only on 
the space A of possible actions, the loss function L, and the distribution 
function of a single coordinate.) 

Now the set Zy is such that, if the initial ¢ is in Zy, there will be no 
advantage in taking N or fewer observations before reaching a deci- 
sion; in this case the risk is p*(§) = ¥(é). On the other hand, if the 
initial ¢ is not in Zy, we will incur a smaller risk, namely, p*y(é) = 1 
+ Elp*y_1(78)] by taking an observation and then following the opti- 
mal procedure S*y—1. Similarly the set Zy-1 is such that, if & = Té 
is in Zy_,, we will gain nothing by expending any or all of our N-1 
remaining observations, whereas, if é is not in Ey- we will run a 
smaller risk by taking an observation and following S*y—2 It should 
be emphasized moreover that the taking of one observation has the 
effect of transforming the initial distribution & into a new distribution 
& = Tg, and that in terms of the new £, we are back to the original 
problem except that N has been reduced by 1. Keeping this point in 
mind, the extension of this argument is clear and justifies the descrip- 
tion of the optimal sequential procedures given in the introduction to 
this section. 

If a given problem does not specify an upper bound to the number 
of subexperiments one might perform, then the optimal sequential 
procedure for a fixed & is s* defined by the limit of the sequential pro- 
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cedures Jm [see (9.3.10)] as m —> œ. However, in certain cases $* is 
necessarily truncated and in fact may be of fixed sample size. The 
conditions under which this holds is given by the following theorem. 


Theorem 9.3.3. If ¥(é;) — 0 uniformly in z, i.e., VE) < of) for 
all z and j, where 9(j) — 0 as j > œ, then the optimal sequential 
procedure is truncated. If y(E;) is a function of j only, the optimal 
sequential procedure 8* is a fixed sample-size procedure. 


Proof. By (7), Theorem 9.3.1, we have that for all N 
(1) Aj(Ev) = min [(Ey), 1 + Ey(hj_s(Téy))] 


so that, for any j, h;(Ev) = Y(n) whenever Y(n) < 1. Since p(y) <1 
for sufficiently large N, say N > No, we have h;(Ey,) = W(Ey,) for all 
J; ie, No eZ; for all j, and hence ¿No e=*. Thus the optimal pro- 
cedure does not require more than No observations. 

Assume now that ¥(£;) = g(j); then, for any procedure $, 


@) 0&8) = DD + oP) 


= E PASII + gG = min [j + gH] = jo + glo) 


Since the fixed sample-size 
Jo + 9(Jo), it is clearly optim: 
on &. 

As an illustration, 
w of a “success” 
independent tri 
k > 0, and whe 


procedure with jy observations has risk 
al. Note that g(j) will in general depend 


consider the problem of estimating the probability 
from a binomial distribution P(x) from a sequence of 
als where for any estimate a the loss is k(w — a)’, 
ve $ is rectangular over the interval (0<w<1). The 


probability of r successes in a sequence of j trials is : a(l — w) 


(r) if sampling terminates with j trials is 


$ 
f at AY — w) dy 
(9.3.18) dr) = Bl) = 1 


a(l — o) ™ dy 
0 


so that the Bayes estimate d 
[see Section 11.2] 


LOT GF + ONG + pi 
TGH +G =r 1) j+2 
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The Bayes risk y/(&;) is given by 


1 
bf (o = ERPE — 0) de 


1 
f a(l = o) dw 
0 

1 > 
k r+2/7 — ay 
Ef o (1 — o) T dw p 

= —k 
1 f +2 
f w(1 — o) 7 dw 
0 


„iL kE ER] 
~ "G+ 3)G + 2) j + 2 


Met DG—r+D 2 G+ D? 
=- G zG) GHG) 


(9.3.14) vE) 


since 0 <r <j. Thus ¥(&) > 0 uniformly in r and the best sequen- 


tial procedure is truncated. 
On the other hand, if the loss is taken as k(w — a)?/o(1 — w), the 


Bayes estimate is ) 
E;(/(. = v 
(0.3.15) a) = Taol o) 


tations, and ¥(£;) = k/j for all v. Thus 
for this estimation problem is a fixed 


z 
j 


as can be seen by direct compu 

the best sequential procedure 

sample-size procedure. 
PROBLEMS 


concave function of £. Hint. Show that it is a 


9.3.1. a *(TE)) is a ci that i 
ea i procedure where the first observation is cost- 


risk function of an optimal sequential 


less, N . 
9.3.2. Let {Gy} be a sequence of truncated sequential games of the type consid- 


ered in Section 9.3, and let {G’x} be another sequence of games similar in all re- 
spects to Gy except that in each G'N the loss due to the terminal decision is assumed 
to be zero whenever all N observations are taken; i.e., y(n) = 0 for all € and z. 
Pr 

(D The Bayes sequential procedures (s'*v} in these modified games are charac- 
terized by regions Z’ = =, and a sequence of increasing regions Z1, Z2, +>: in z such 
that, for a given £, S'*w will require taking no observations if and only if £ ely. 

(2) Ely c Ey where Ey is the corresponding stopping region in the original game 


Gy. 
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i ll N, 
* is the Bayes risk for 8’*y, then r*y(t) < r*y4i(t) < ¥(é) for all N, 
a ona AG oe as N — ©, where by definition r*o(t) = 0 for all ¢. 
4) r*y(€) = min [Y), 1 + E(r*y-a(Té))]. , 
i Y a is the Bayes risk in Gy and p*(t) = yim e* (8), then r*y(£) < p* y(t) 


for all N and r*(é) = p*(é). 


(6) Let =’* = U &;, and let Jm = (T'o, «++, T’m—1, V'm) be the sequential pro- 
jal 


cedure defined by 
1; = {uit ¢e* for r<j, tez’*} 
V'm = C(T’0 U Th U +++ U Tm) 


Furthermore, let 7*(¢, 5‘m) be the risk from I'm in the game G’,. Then r(£, Ym) > 
r*(t) asm — œ. Hint. Corresponding to the functions ajy and hj of Section 9.3 
define a’jy and h'; with h'o = 0 for the modified game, and prove by an induction 
backward that a’jy is an increasing function of N and hence that h’; is an increasing 
function of j. Also prove by induction that h'i(E) < hil) for allj and £. The remain- 
ing steps are similar to those employed in the proof of Theorem 9.3.1. 

We observe that, for any N, p*y(é) is an upper bound for p*(£) and r*y(é) is a 
lower bound. Also for any N we have Z'n c E*C En. 
9.3.3. Given a decision procedure involving an action space A, a parameter space 
Q, a loss function L on Q X A, and two independent random variables f and g with 
respective distribution functions Po and qu for each we. Characterize the trun- 
cated Bayes sequential procedures in terms of regions in the space Z if at each stage 
the experimenter has the choice of (1) taking an action without further experimen- 
tation, (2) taking an observation on the random variable f at a cost of c1 units, (3) 
taking an observation on the random variable g at the cost of co units. The obser- 
vations at all times are independent. Hint. For any fe Z let Tyt be the a posteriori 


probability if a value on f is observed, and let T,& be the a posteriori probability if 
a value on g is observed. Define 


(1) W = inf Z Llo, ae) 
(2) holt) = Ye) 

and by induction onj 

(3) 


Ay) = min (YQ, c + Ehja(Ty&), c2 + Ehj4(T,t)] 
Also for each j define 


(4) Zo = (Ea) = WO} 
(5) Er = (EE =o 4 Ehj_(Tjt)} 
(6) 


observations, (G = 0, 1, ses 
a posteriori probability distrib j observati TA stale 
the best possible action witho y ations. If & © Zo, mj 


If ge Zj, y—j, take an 
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observation on the random variable f; if © Zg, v—j;, take an observation on the 
random variable g; in either case compute £1, and follow the sequential procedure 
defined by the regions Zo, n-j—1, Zf, N-j-1, and Zg, N—j—1- 


9.4. Bayes Sequential Procedures for Finite Q 


In this section we shall study sequential games of the type considered 
in Section 9.3, except that we shall restrict @ to be finite, say, 2 = (1, 2, 
-++, h). In addition, we shall replace, as we may, the terminal action 
space A by a subset W of the h-dimensional Cartesian space, where the 
jth coordinate of a point w = (wi, W2, +++, Wa) e W is LG, a), and as- 
sume that W is closed and bounded. In this equivalent game, a ter- 
minal-decision rule is a function d with values d(j, x) = w(j, £) mapping 
the space J X X into W where J = (0, 1, ++ +). 

Paralleling the definitions given in Section 9.2, we define 


h i 
Tuwe p 


(9.4.1) rj(£, W) = = L = L wit) 
D ieo 
t=1 k=1 

(9.4.2) r4j(z) = inf 1y(, W) = ¥:) 


represents the a priori probability dis- 


where € = (£(1), «++, &(h)) £2 
ers E= CU) bability (or density) of x, when 


tribution over Q and f,(ax) is the pro 


nature is in state 7. , x é 
Since W is closed and 7;(x, W) is linear and hence continuous in the 


Wi, there exists a terminal-decision procedure d* with d*(j, x) = w*(j, £) 
such that 7)(x) = 7;(v, w*). Hence, in view of Theorem 9.2.1, we have 

Theorem 9.4.1. For a fixed § there exists a terminal-decision func- 
tion d* such that 


(9.4.3) p*(E, $, d*) = min e*(, S, d) 


uniformly in $ where § = (&(1), +5 E(h)). 
9.4.2. Let S*y be the Bayes sequential procedure truncated 
at N, and let 8* be the Bayes non-truncated procedure. Furthermore, 
let p*y(E) = pË, S*v, d*) and p*(&) = (€, 8*, d*) be the corresponding 
risk functions. Then p*y(§) and p*(€) are continuous and concave 
functions of £ where £ = (E(1), +5 &(h)). 


Theorem 
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Proof. Since W is bounded, 


h 
(1) VÈ) = min > wl) 
weW i=l 
is a continuous function of § by Theorem 2.2.7. Also, by Theorem 9.3.2 


(2) e*v(&) = min [¥(€), 1 + E(p*w_s(TE))] 


for all NV, so that, for N = 0, p*o(&) = ¥(E) is a continuous function of 
§. We shall prove this theorem by an induction on N. Assume that 
p*,(&) is a continuous function of £ for all k = 0,1,---,N—1. Then 
p*y—1(T8) is a continuous function of TE, where 


(3) TE = an = a(l), iii Ez (h)) 
and j 
9 hO = Sere 

L SEDO) 

j=l 


Now for a fixed zı, È., is a continuous function of £ = (£(1), tes, 
£(h)), so that, for a fixed x1, e*v-1(TE) = gÈ, zı) isa continuous function 
of §, since the composition of a continuous function with a continuous 
function is continuous. It remains to show that Elg(, 21)], and hence 
min [¥(€), 1 + H(g(E, z1))] is a continuous function of £. Now gÈ, 21) 
is uniformly bounded by some number, say K, so that 


h 
(5) E(p*wa(T8)) = DD Silerdg(&, sdt) 
t=1 71 


h 
SUE Kimya) = K 
i=l Tı 
Thus the series (5) converges uniformly in È since it is term by term less 


than the convergent series x E Kfi(as)E(i), and hence E(p*y_,(TE)) 
is continuous. i 

Now, if Q is a continuous function of u and 
tinuous functions of £, then Q(k(é), 
In particular, Q(u, v) = min (u, v) i 
so that, if we set u = YÈ) and v 


v, and if k and h are con- 
A(€)) is a continuous function of £. 
s a continuous function of u and v, 


i à =1+ E(o*y_,(TE)), we conclude 
that p*v(&) is a continuous function of £. This completes the induc- 


tion. The continuity of e*(§) follows from Theorem 9.3.1(d) which 
states that p*y() — p*(E) uniformly in £, 
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That p*y(€) is a concave function of Ẹ has been proved in Theorem 
9.2.3. The concavity of p*(&) is an immediate consequence of Theorem 
2.2.7, 


Corollary 1. If Q is finite, the functions h;(€) defined by equation 8 
of Theorem 9.3.1 are continuous and concave functions of § for all j. 
Proof. The truth of this corollary follows from Theorem 9.4.2 and 


(1) Theorem 9.3.2. 
Let Q be finite, and for each we W let T(w) be a set in Z defined by 


h 
(9.4.4) T(w) = {E:¥@) = D Ow) 

i=l 
That is, T(w) consists of all £’s for which an optimal terminal decision 
isw. Furthermore, let 


(9.4.5) Am(W) = Em N T(w) 


where Zm is defined in (11), Theorem 9.3.1. Then A,,(w) is a set in Z 
such that, if £ e A,(w), the best procedure truncated at m observations 
is to decide w without experimentation [see Theorem 9.3.3]. 

Theorem 9.4.3. The sets Am(w) defined in (9.4.5) are closed and con- 
vex. If wi, woe W with wı # We, then An(wi) and Am(W2) have no 
interior points in common. 


Proof. Both T(w) and Zm are defined by equations involving con- 
see Corollary 1, Theorem 9.4.2], and consequently 


tinuous functions [ 5 ; i 
they are closed sets, and, since the intersection of closed sets is closed, 


it follows that Am(w) is closed. 
Now let à; and àz belong to An(w), and let 


E=aht(1—a), O0SaS1 


To show that Ẹ e Am(w) also. From (9.3.11) and (1), Theorem 9.4.2, 
we have 


h h 
(1) p*n(&) < vE) < E Ow: = a Dw; + (1 — a) 2 Aa(t)ws 
~ i=l i=l i= 


But, since 41, A2 € Am(w), then by (9.4.4) and (11), Theorem 9.3.1, 


ll 


(2) x tiwi = 4A) + (L — a)y) 


i=l 


ap*m(d1) + (1 — @)p*m(hz) 
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Also, by Theorem 9.4.2, p*n(§) is a concave function of §. That is 
(8) p*m(S) > ap*m(d1) + (L — @)p*m(Az) 
We therefore have , 
(4) an) + (1 a)n) < p*n(E) < ¥(E) < È EQ@)w; 

= ap*m(M1) + (1 — &)p*m(Az) 
Thus ¥(§) = > (Zw; and e T(w). Also, from (4), p*,(&) = (È), 
and hence, ie Gn, Theorem 9.3.1, Ë € Zm, which completes the proof 


of the convexity of A,,(w). The proof of the last part of the theorem 


follows the same argument as that employed in the proof of Theorem 
6.3.1. 


Theorem 9.4.4. Let T(w) be defined by (9.4.4), and let A(w) = 
=* N T(w) where =* is the intersection of Zm for all m. Then A(w) is 


closed and convex, and for any wy, wee W , the sets A(w;) and A(w2) 
have no interior points in common unless w; = Wo. 


Proof. The truth of this theorem follows from Theorem 9.4.3 by 
noting that the intersection of closed convex sets is a closed convex set. 


[See Problem 2.2.1.] 
PROBLEM 


9.4.1. Determine the regions An(wi) for m = 0, 1, 2, 3 in Example 1 of Section 
10.7. 


CHAPTER 10 


Bayes and Minimax Sequential 
Procedures When Both 
2 and A Are Finite 


10.1. Introduction 


The general characterization of Bayes sequential procedures, for the 
case where Q = (1, 2, +++, h), A = (l, 2, 07 k), where the observa- 
tions are independent with the same distribution, and where the cost 
per observation is constant, follows directly from the results of the 
previous chapter and can be briefly summarized as follows: 

Let = be the space of a priori probability distributions on 2. Then 


= can be represented as an (h — 1)-dimensional simplex with points 


& = (E(1), £(2), «++, EA), EC) 2 0 for all Zand 289) =1. Let W be 


an h X k matrix with elements wi; = L(i, j) where L(i, j) is the loss if 
nature is in state į and the statistician decides j. We shall designate 
the columns of W by column vectors Wi, W2, ***, Wk SO that a decision 
j is equivalent to the selection of the column w; from W. For any se- 
quence of observations 21, ta, ***) Zr let E, = (&-(1), «++, &(h)) be 
the a posteriori probability distribution on 2 where 


(3) TL Solas) 
(10.1.1) &(s) = CSL nh = 0,1, +) 


EO IK) 
ii j=l 
and where f,(x;) is the probability distribution of a; (j = 1, 2, -+-), 


given that nature is in state s. 
First consider the class of Bayes sequential procedures $*y trun- 
cated at N observations. Then according to Theorem 9.4.3 there ex- 
261 
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ist k(N + 1) closed and convex regions Am(w;) (m = 0, 1, oe, N; 
j =1, +++, k) in = which completely characterize this class in the fol- 
lowing sense: Initially we are given a §. If §e Ay(w,) for some j, we 
make decision j without any observations. If £ ¢ Ay(w,) for any j, we 
take one observation and compute the a posteriori probability distri- 
bution £;. We then look at the regions Ay_;(w,). If & © Ay—1(wy) 
for some j, we stop taking observations and make decision j. If & ¢ 
An—ı(w;) for any j, we take a second observation, and so forth. In 
general, if we have not stopped with the first r — 1 observations, we 
take an additional observation and compute Ë. If & e Ay_,(w,) for 
some j, we terminate sampling and make decision j. If E, g Av_,(wj) 
for any j, we take an additional observation. The quantity &y will be- 
long to Ao(w;) = Zo N T(w,) for some j since every Ẹ belongs to some 
T(w;) and Zo = Z, so that a decision will be reached at the Nth obser- 
vation if none was reached before. 

The class of non-truncated Bayes procedures $* has a somewhat 
simpler structure since it is completely specified by only k convex re- 
gions A(w,), +-+, A(w,) in Z [see Theorem 9.4.4]. As in the previous 
case, we take observations in sequence and at the rth stage (r = 0, 1, 
2, +++) we compute Èr. If there exists a j such that £, € A(w;), we ter- 
minate sampling and make decision j. If, on the other hand, Ë ¢ 


Uaw), we take an additional observation. That sampling in the non- 
j= 

truncated case will terminate with probability one follows from the 
finiteness of the Bayes risk p*(£) [see, for example, proof of (b) in Theo- 
rem 9.3.1]. 

We see then that the problem of characterizing the class of Bayes 
sequential procedures under consideration is equivalent to the problem 
of delineating in the space = the convex regions A,,(w;) for the truncated 
case and A(w;) for the non-truncated case. We shall indicate here a 
step-by-step method, beginning with m = 0, for de 


: termining the bound- 
aries of the regions A 


m(W;), which will converge to the boundaries of 
A(w;), for the case h = k = 2, i.e., dichotomies. This method, as will 
be seen, can be generalized to any h and k, but a detailed description 
which would take care of the various pathological cases that might 
arise seems hardly worth the effort in view of the computational diffi- 


culties involved even in the special case to be considered. No direct 
method has as yet been developed for determining the boundaries of 
A(w;) in the general case. However, for h = k = 2 such a method ex- 
ists, at least from a theore 


tical point of view, and will be discussed in 
some detail. 
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10.2. Method for Determining the Boundaries of the Stopping Regions 
for Truncated Sequential Dichotomies 


0 Wi2 
Let Q = (1, 2), A = (1, 2), W = - That is, we are faced 
w21 


with two alternative hypotheses H, and Hz where under H, the proba- 
bility distribution of an observation zs is file) (s = 1, 2, +) and 
under Hy it is fo(vs). If H; is true and we decide H; the loss is w;; for 
i Æ j and zero otherwise. Let È = (¢, 1 — ț) where ¢ is the a priori 
probability for Hı, and hence 1 — ¢ is the a priori probability for He. 
Then, writing any function of § as a function of ¢, we have 


(10.2.1) p*o(t) = v(t) = min [fwi (1 — $)wai] 
and hence, setting yo = 50 = W21/ (w21 + w12), we have 
(10.2.2) T(w:) = [ôo <¢< 1, T(w2) = [0 < ¢ < vo] 


Now let p*,(¢) be the risk function from the optimal sequential 
procedure truncated at m observations. Then by Theorem 9.3.2 


(10.2.3) p%n(t) = min {¥(), 1 + Ele*ma(PO}} 


Consider the sets Am(w:) and Am(w2). Since they are closed and con- 
vex, they must be closed intervals. But clearly ¢ = 0 is an element of 
Am(We) and ¢ = 1 is an element of A,(w:). Thus there must exist two 
values of ¢, say ¢ = Ym and [ = ôm with Ym < ôm such that 


(10.2.4) Am(W:) = [im < ¢ < 1} Am(W2) = [0 Sf < Yal 


Thus it remains to determine Ym and ôm for m = 0, 1, 2, +++. To this 
end we prove the following theorem. 

Theorèm 10.2.1. A necessary and sufficient condition that ¢ = ¢ be 
a boundary point of the set Am(W;), j = 1 or 2 is that 


(10.2.5) Wa) = 1+ Elom (T)] 


Proof. That the condition is necessary follows from the continuity 
of the risk function p*m(t) [see Theorem 9.4.2] which implies that for 
a boundary point the risk of making an optimal decision without ob- 
servations must equal the risk of taking one observation and employing 
the optimal sequential procedure from then on. To prove the suffi- 
e observe that E[p*n—i(T's)] is a risk function 


ciency of the condition, Wi : 
of a Bayes sequential procedure truncated at m observations when the 
first observation is costless [see Problem 9.4.1] so that it is a continuous 
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and concave function of ¢. Moreover, this function vanishes at ¢ = 0 
and¢=1. , j 

Now suppose first that ¥(¢1) = $1we1, and consider any bs < i or 
which it definitely pays to make decision w2 without experimentation; 
i.e., 
(1) tawai < 1 + Elo*m—(Tto)] 
Then if we multiply (10.2.5) by « and (1) by 1 — a where 0 <a < 1 
and set ¢ = af; + (1 — a)tz we get 
(2) twa < 1 + aElo*m(T$:)] + (1 — a)Elo*m(Tt2)] 

< 1 + Elp*na(TS)] 

Thus, for all ¢ < {, 


wai < 1 + Elo*m— (TE) 
and, since equality holds for ¢ = ¢, it must follow that, for a ¢ in a 
small neighborhood beyond ¢;, 

swe, > 1+ Elp*n—1(T)] 


A similar argument can be applied to the case yt) = (1 — $1)wre- 
Thus either ¥(¢) < 1 + Elp*n_i(70)] for all §, i.e., there exists no ¢ 
for which it pays to take observations, or the curve y = (ç) will inter- 
sect the curve y = 1 + Elp*n_s(TO)] at exactly two values of ¢ which 


are the boundary points of the stopping regions A,,(w;) and Am(W2). 
[See Figure 13.] 


y 


/ 
d = ye) 
ib 7 s 


N 
| 
| 
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| | 
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> 
Am(wo) Tn = An(w;) 


Figure 13 
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We see then that the problem of determining Ym and ôm, as well as 
p*n(¢), reduces itself to the problem of obtaining the functional form 
of Elp*m—1(T¢)]. The latter is in turn solvable if we know the func- 
tional form of p*m—({) and are able to compute the expectation of 
P*m—1ı(T¢) for each ¢. 

Assume that the function p*m—ı(¢) is known, and let y be the generic 
symbol for the observations z;. Then the expectation of p*m_i(T'f) is 
given by 

hwy) ) 
* = * es SG 
(1026) Elem aM = t È pwa a Fa- oo. 


% chy) 
a aa P +4- si) 2 


which is an explicit function of ¢. Now, for m = 1, p*o() given by 
(10.2.1) is a very simple function of ¢ so that Elp*o(Ts)] can be obtained 
from (10.2.6) in a straightforward manner. Knowing Elp*o(To)], we 
can compute p*;(¢) from (10.2.3) and hence Elp*,(T¢)] from (10.2.6) 
and therefore p*2(¢) from (10.2.3) and so forth. 
It is to be observed that, while the averaging process involved in 
(10.2.6) is laborious from a computational point of view, the fact that 
the determination of the stopping regions and the Bayes risk involves 
nothing more complicated than taking expectations is of theoretical in- 
terest. Also this method can be considered as an iterative procedure 
for obtaining the Bayes risk and stopping regions of the finde seacasta 
sequential procedure. It yields at each stage an upper bound for p (2) 
and upper bounds for the regions A(wı) and A(we). A similar iterative 
procedure implied in Problem 9.4.2 yields lower bounds for these quan- 
tities. More specifically, we employ the equation [see Problem 9.4.2] 


rn (¢) = min WO), 1 + Elr*n (TH 


(10.2.7) 

where Efr*m—ı(T¢)] is given by (10.2.6) with p* replaced by r* and 
where 

(10.2.8) r¥(¢) = min YO), 1] 


to obtain increasing lower bounds for p*(¢); and the equation 


(10.2.9) WO) = 1 + EC na(PO) 


to obtain increasing lower bounds for A(w,) and A(wə). 
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PROBLEMS 


10.2.1. Let y be a binomial random variable taking on the values 0 and 1, and 
0 15 
let fill) = %, fl) = 34, Let W = É 


ay Determine the regions A,,(w1) and 
Am(we), and compute p*m({) for m = 0, 1, 2. 


10.2.2. Consider the same problem as 10.2.1 with W = ( 


15 0 6 
the regions Am(w;), j = 1, 2, 3, and compute p*m(¢) for m = 0,1. Note that we have 
here an entire interval in which hy(f) = ¥(¢) = 1 + Eho(Td). 

10.2.3. Compute r*,,(¢) for Problem 10.2.1 for m = 0, 1, 2, and obtain y'm and 
ô'm and hence A’m(wi) = [ôm << 1], A'n(we) = [0 < t < Y'm] from equation 
10.2.9. Plot r*2($) and p*s(t) on the same graph, and indicate on the abscissa the 
four regions Ax(w1), A’2(w1), As(we), A’o(we). 

10.2.4. Let h and k be general, and let { be a point in = satisfying the equation 


VË = 1 + Elp*m—s(T#)] 


0 15 Ì3 7 
. Determine 


Furthermore, let V be the face of the simplex = of lowest dimensionality such that 


foe V. Then either £ is a boundary point of Am(w;) for some j or every point in 
An(wj)( V satisfies the above equation. 


10.2.5. What general conditions must a 3 X 3 loss matrix W satisfy in order for 
the stopping regions Am(w;) j = 1, 2, 3 to have the general shape indicated in Fig- 
ure 1 


Figure 14 


10.3. Method for Determining the Boundaries of the Stopping Regions 
for Non-Truncated Sequential Dichotomies 
Pied pa enlarge Sequential procedures $* for the dichotomy 
ed in i i 
eh € beginning of Section 10.2 are characterized by the two 
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(10.3.1) A(wı) = (ë < ¢ < 1), A(w2) = (0 < $<) 
with y < ô. Assume for the moment that y and ô are determined. Let 
j = 0, 1, 2, --+ represent the number of observations taken in sequence. 


At each stage we compute t; where 


t TI At) 
(10.3.2) $; j ea - ; MET 
tTA@) ++ a-e Hee) 
1 


i= 


i=l 


We continue taking observations as long as y < f; < ô. We stop as 
soon as, for some j = n, either fn > ô or fn S Y. In the former case, 
we accept H; in the latter case, we accept Ho. 

It is easy to see that the above-described procedure (if $* requires 
taking at least one observation) is identical with the following likeli- 


hood ratio procedure (called the sequential probability ratio test), 
Let 


(10.3.3) = IL fate) / Ate) 
sl — 7) _ a4) 
(10.3.4) Ar = Cpr È 0-9 


Observations are taken in sequence, and sampling continues as long as 
By < à; < Ap. Sampling terminates as soon as, for some sample size n, 
either \, < B; or Mn 2 A,;. In the former case H; is accepted. In the 


latter case Ho is accepted. 


We define 
y tog 2 
a damien) 
(10.3.6) a; = log Ay, —b; = log By 


are non-negative for all ¢ in the interval (y, 6).) 


N b 
aa Aa A he sequential procedure 8* can be defined 
J 


In terms of these quantities t! 
as follows: Continue sampling as long as —by < 2 zi <a. Terminate 


sampling and accept the appropriate hypothesis as soon as, for some n, 
a n 


a> ay or 2% < — br 
=i isl 
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We now turn to the problem of determining y and ô. It follows from 
the continuity and concavity of the Bayes risk p*(¢) that y and ô sat- 
isfy the two equations: 


(10.3.7) Wi2 


1+ Elp*(Ty)] 
(10.3.8) wo, = 1 + Elp*(T)] 


That is, at both of these points, the risk of making the appropriate de- 
cision without taking observations is equal to the expected risk from 
taking one observation and then going on with $*. It remains there- 
fore to obtain an expression for E[p*(7'¢)] for ¢ = y and ¢ = ô. 
Consider first the case ¢ = y. If we set ¢ = y in (10.3.2) we get 


yU — 6) 
(1 — y)6 


It is clear that for this case the first observation, to be designated by 
Y, will either terminate the sampling or result in a new sequential- 


(10.3.9) a, = log A, = 0, —b. = log B, = log 


probability ratio test with boundaries a’, = —z and b, = —(b, + 2) 
fay) 
where z = lo d-b, < : 
En 1(y) Š as 


For any sequential-probability ratio test defined by boundaries —b 
and a with a > 0, b > 0, let 


n 
(10.3.10) ra(—b, a) = (ds < ~b| Ho): = 1,2 


j=1 


Then 7(—b, a) is the probability that we accept Hı when He is true. 


We also have 1 — ,(—b, a) = P (È z; > a| Hu 
=1 


Theorem 10.4.1 below]. Further, let Esin | —b, a] stand for the ex- 


pected number of observations required to reach a decision when Hw 
is true, and let 


[see, for example, 


(10.3.11) pı(—b, a) = Eiin | —b, a] + w(1 — mı(—b, a)) 
(10.3.12) pa(—b, a) = Holm | —b, a] + woymo(—b, a) 


Then p,(—b, a) is the risk from the sequential-probability ratio test 
when H, is true. 


In terms of the above quantities we can express 
Elp*(T'y)] as i 
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(10.3.13) Elo*(Ty)] = vwi2P(e > 0 | A) 


+ È alb +2), -2)n@)I 
—by <: <0 


+ (1 —ylwaP( < —by|He)+ ÈE px(—© + 2), -2e 
—by <z<0 


fo(y) 
hi = . ns Se y w= 
where ga (2) = P fv: log Fw) z| He) 12 


Similarly, if we set ¢ = ô in (10.3.4) we get 


61 -— 
l=) _, bs = log Bs = 0 


(10.3.14 = log A; = | , 
) as = log As E ay x 


Hence the first observation will either terminate the sampling or re- 
sult in a new sequential-probability ratio test with boundaries a’; = 
by — z, —b'5 = —2. We therefore have 


(10.8.15) Ele*(T8)] 
= dfwieP(e > by| Ha) + E ala by Ane 
0<z<by 


+ (1 — d[waP@ < 0| H2) + È p2(—z, by — 2)92(2)] 
0<z<by 


We note that Elo*(Ty)] and E{p*(7'6)] are both functions of y and 6 
so that (10.3.7) and (10.3.8) are two equations involving two unknowns 
which, at least theoretically, can be solved. The major difficulty, from 
a computational point of view, is obtaining explicit expressions for the 
two risks pı(—b, a) and p2(—), a) as functions of @ and b. This, in 
general, involves the solution of fairly complicated “random walk with 
absorbing barrier” problems which is beyond the scope of this book. 
However, approximate expressions for pı and p2 can be obtained [see also 
Problem 10.5.5], and the underlying theory will be given below. We 
shall also consider a few simple situations where this theory gives exact 
solutions to pı and pə. Tt is of interest to point out that the expressions 
enclosed in brackets in equations 10.3.13 and 10.3.15 depend only on 


by = log LZP? and that equations 10.3.7 and 10.3.8 are linear in 
— ô 
va ) puted the quantities in the 


Hence, assuming that we com. tities 
we can solve for Wij = uly, 6) with ij = 12, 
as to keep by fixed obtain a whole se- 


Wiz and w21. 
brackets for a fixed by 
21, and by varying Y and ô so 
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quence of sequential procedures which are optimal for the loss matrix 
( 0 uly, 2) 
der (7, ô) 0 


0 3 
10.3.1. Let 2 = (1, 2); A = (1, 2,3); W = ( ). Prove that a neces- 
we 0 weg 
sary and sufficient condition that A(ws) be non-empty is that the line y = fue + 
(1 — $)wes intersect the curve y = p*(¢) where p*(¢) is the Bayes risk of the dichotomy 


wy 
2= (1, 2), A = (1, 2), W = 3 


PROBLEMS 


wrz w 


wy 0 
10.3.2. Assuming that A(ws3) in the above problem is not empty, characterize 


the regions of A(w;), A(w2), and A(ws), and derive equations for determining their 
boundaries. 


10.4. Some Theory in Sequential Analysis 


In the previous section we showed that the class of Bayes non-trun- 
cated sequential procedures for a dichotomy requiring at least one ob- 
servation is equivalent to the class of likelihood ratio tests which in 
turn is equivalent to the following procedures. 
cal random variable z which on the hypothesis 
ability distribution g, 


We are given a numeri- 

Ha (w = 1, 2) has prob- 

(z). Then, corresponding to any a priori proba- 

Wie 

bility ¢ for Hy, and any loss matrix W = : there exist two 

W21 

non-negative constants a and b. Observations Zi (i = 1, 2, +--+), are 

j 

and at each stage j we compute Z; = z We 

i=1 


= 
continue sampling as long as —b < Z; <a. We terminate sampling 
at the nth stage if n is the smallest value of j 


such that Z, > a or Zn 
< —b. In the former case we accept Hp, and in the latter case we ac- 
cept H4. 


taken in sequence, 


Let a and b be fixed, and let us consider the general situation where 
the probability distribution of z is Ju(2) with arbitrary w. We note that 
n (the sample size at which we terminate sampling) and Zn (the value 

n 


of >> z; when sampling is terminated) are both random variables whose 
i=1 


i 


joint probability distribution depends on w. We shall investigate some 
of the properties of this distribution. 


Theorem 10.4.1. If, for a given #, Pole = 0) ¥ 1, then there exists 
avwithO<p< land ac > 0 such that 


(10.4.1) Poln >N) <o for N> No 
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Proof. Since z = 0, then either Pa (z > 0) > 0 or Pa(z < 0) > 0. As- 
sume first that P.(z > 0) > 0. Then, since P,(z > 0) = lim P.(z > d), 
a0 


there exists a positive number d, say do, for which P,,(z > do) = po > 0. 
Choose an No such that Nodo > a + b, and define 


(1) WA redas 
Y2 = 2No+1 +++++ Zen, 
Yk = ZDN HF ZN k= 1,2, +-- 


The random variables y1, **+, Yx are independent with identical proba- 
bility distributions. Moreover, 


(2) Por > a+ b) > Pale > do, +++ 2N, > do) = po™ = p* 


Thus each y; has a probability of at least p* of exceeding a + b. 
Let E be the event that at least one y; (i = 1, 2, -+-, k) exceeds a + b. 
Then 


(3) Poln < kNo) > Po(B) 


For assume y, > a + b for some r < k. Then if the sequential test 
did not terminate before or at (r — 1)No observations, we have —b < 


r—1 r 


bD Yi < a, so that $ y: > —b + yr > a, and hence it must terminate 
i=l i=1 


with r observations. Now, from (2) 
(4) Poly Sa +b, +++, yx La + b) < (1 p* 


and hence P,(Z) > 1 — (1 — p*)*. Setting N = Nok +h (O0<h < 
No — 1), v = (1 — p*)“*, and employing (3), we obtain 


1 yê 
(5) 1- ( = =) W < Poln < kNo) < Pan < N) <1 


Finally, letting c = (1/1 — a haii we get 
(6) P.(n>N)<o fr N> No 
The argument for the case P.(z < 0) > 0 is similar. 


Theorem 10.4.2. Let E.(n|—b, a) be the expected number of ob- 
servations required for sampling to terminate, E.{Z,|—b, a] and 
E.[Z*,,| —b, al, the expected value of Z, and Bx, respectively, Eole) 
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and o.”, the expected value and variance of z, all under the condition 
that the distribution of z is g..(z). Then always 


(10.4.2) E.{Zn| —b, a] = Euin | —b, alB.(2) 
and, if £.,(z) = 0, 
(10.4.3) BIZ’, | —b, a] = Eoln | —b, alo? 


Proof. Let Sy, N > No, be a truncation of the sequential procedure 
under consideration; i.e., Sy follows the sequential procedure for N ob- 
N 


servations and then terminates. Taking expectations of > z; with 
i=l 

respect to the joint event n and Z, arising from the procedure Sy, we 

have 


0) NE.) = Ti 


M> 


i=l 


I 


z) = X Patr DAPA n -i| 


i=l 


i=l 


N 
+ Poln > N)E, [Eal n> x| 


where the event (n > N) means that the non-tr 


uncated procedure did 
not terminate at N. Now, for n = j, 


N 


(2) B| Zala =i] - £.(Zaln=3) 


t= i=l 


a + Eole) [EN — j) | n = jl} 
Substituting (2) into (1), we get 


N f N 
8) 0= È Paln = j)Ev [2al n= i] = Eole) E GP a(n = j) 
j= i=l j=1 
N w 
+ Polín > N)E, [= z|n> x| HEN Z P.(n = j) 
i=1 j=N+41 


i 


N 
We observe that, forn > N ED EA a, so that by Theorem 10.4.1 
i= 


t=1 


N 
lim P,(n > ME (Zz n>N) =0 
N >o i: 


and 


lim N © Pn=j)=0 


N >o i=N+ 


This proves the first part of the theorem. The second part is proved in 
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N 2 

a similar fashion by taking the expectation of (= z) with respect to 
i=l 

the joint events n and Zn. 

Theorem 10.4.3. Let y(t) = Eu(e'*) be the moment-generating 
function of z, and let it be assumed to exist for all real ¢. Then a neces- 
sary and sufficient condition that there exists a t = tẹ with ts ~ 0 such 
that pulto) = 1 is that £.(z) = 0 and that z take on both positive and 
negative values with positive probability. 


Proof. To prove the sufficiency, we observe that ¢”.(¢) = E.,(z*e'*) 
> 0 unless z #0. [Since ¢,(¢) exists for all ¢, it is differentiable any 
number of times as is shown by Lemma 11.5.1.] Thus g(t) is a convex 
function of z. Now by assumption there exists a value z’ > 0 such 
that P.(z > 2’) =u>0. Hence for t> 0 


(1) DX eM gu(z) > DO e pE > u” 
2 z>7 


and consequently y(t) > © as t — œ. A similar argument shows 
that g(t) > œ% ast > —o, Thus g(t) assumes a minimum value at 
a unique point say, t* for which ¢’.(t*) = 0. Now ¢’.(0) = Ea(2), so 
that ¢* #0 unless Z.(z) = 0. Since ¢.(0) = 1 and g(t*) < ¢.(0) 
whenever 2,(z) = 0, it must follow that there exists a t = ty ~ 0 such 
that ¢.(t) = 1. To prove that the condition is necessary, suppose 
P.(z > 0) = 1, and let Px(zg = 0) =a <1. ThenP,(z > 0) = 1 — a. 
We shall show that gu(t) > a ast > —œ. Let t<0. For any «, 
0 < e < 1 — a, we can find a positive number c such that P,(0 < z < ¢) 
<e. Then 


(2) a S gult) = Eule") < [a + e] + [1 — æ — de 
and hence 
(3) a< lim g(t)<at+e 

t>o 


Since e is arbitrary, lim g.(é) =œ. We see then that ¢'(t) > 0 for 
praa 


all t, and hence g(é) = 1 has no solutions other than ¢ = 0. A similar 
argument shows that, if P.(z < 0) = 1 and P.(z = 0) < 1, then ¢'(é) 
< 0 for all ¢. This completes the proof of the theorem. 


„Corollary 1. The value of t» in the above theorem is opposite in 
Sign to £,,(z). 


Proof. For allreal y, e” > 1 + yso that g(t) = Ey (e7) > 1 + tf,(z) 
and, since ga(ts) = 1, teBe(z) < 0. , 
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` Theorem 10.4.4. (The fundamental identity for real values of t.) 
For a given w and for all real ¢ such that e(t) > v (where v is defined in 
Theorem 10.4.1) 


(10.4.4) Elele) 7] = 1 


and, if P.(z > 0) > 0 and P..(z < 0) > 0, then (10.4.4) holds for all 
real t. 


Proof. Let the sequential procedure Sy be defined as in Theorem 
10.4.2. Then since 


N 


DS Zi 
D Bafe] = te) + EN = leal” 


we can write 


@ 1=E È "en ] 
=D Pn = jE [E e In =3] 


N 


DE} 
+ Poln > NE, [e = lA | n > x| 


F N 
But, for n = j, È. z: is independent of ` z;, and therefore 


i=1 i=j+1 
N 


tD zi 
6) E E = Cead) |n = i] = Bale” les) | n = 9 


Equation 1 thus becomes 


N 
(4) 1= E Pum = AEMP | n = 9] 


j=1 
7 leod : i 


Since, for n > N, —b < Xei < a, then, by (4) and Theorem 10.4.1, 
for N sufficiently large, 


n>N] 


N 
© 0S1- E Pen = Eee) n = 9 <[2_]” 
& e) |n = jl < [= = K 
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where K (ê) is positive and for fixed w depends only ont. Letting VN —> », 
we see that, for all real ¢ such that a(t) > v, equation 10.4.4 holds. 

Suppose now that z takes on both positive and negative values so 
that a(t) has a minimum which is assumed at, say, t = t*. Then it 
follows from (4) that for all real ¢ 


; , CAO) ad [eo(t*)]* 
(6) P.(n > N) < KO Pon > N) < Eo 
and hence 
7 ee er ee aly KO 
(1) O<S1— È Paln = jE lee |n = JS ie K(¢*) 


j=1 


Thus, if we let N — ©, we see that in this case equation 10.4.4 holds 
for all ¢ with the possible exception of ¢ = ¢*. Even in this case, the 
fundamental identity holds, but the proof will be omitted. 

We observe that for all w for which Z,(z) # 0 equation 6 gives an 
upper bound for P.(n > N) in terms of the minimum of ¢,(¢). 


PROBLEMS 


10.4.1. Prove that the moment-generating function of n exists for all real ¢ such 
that eb < 1/v. Hint. Write 


a > N, ye 2. P 
Bale) = Yo Paln = fei < È Poln =j t De Poln > jet 
j=1 j=1 J=N+1 


and employ Theorem 10.4.1. r j 
10.4.2. Prove that, if gu(t) exists for all real ¢, the fundamental identity can be 


differentiated any number of times with respect to t. Then, using this fact, give an 

alternative proof of Theorem 10.4.2. Hint. Show that the kth derivative of the 

remainder in equation 4, Theorem 10.4.4, converges to zero for all k as N > ©, 
10.4.3. Show that the fundamental identity holds for all ¢ in the complex plane 


for which | ga(t) | > » where | eo | stands for the modulus of gult). 
10.4.4. Using the results of Problem 10.4.3 show that Ea(e'^") = 1 for all ¢ in the 


complex plane for which ga() = 1. 


10.5. Approximation for Pu(Zn < —b) and for E,,(n) 


Let z have probability distribution Jol), and let P.(z > 0) > 0 and 
P.(z <0) >0. We define 


(10.5.1) To(—b, a) = Piu(Zn < —b); J= Tu(—b, a) = Pul(Zn 2 a) 


Furthermore, let E.(n| —b, a) stand for the expected number of obser- 
vations required by the sequential procedure under consideration. 
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Assume that E,,(z) Æ 0 and that ¢,(é) exists for all real . Then, by 
Theorem 10.4.3, there exists a t = te ~ 0 such that 9.(f.) = 1. Sub- 
stituting tu for ¢ in the fundamental identity (10.4.4) we get 


(10.5.2) Ese) = 1 


We shall neglect the excess of Z, over the boundaries —b and a and 
assume that Z, takes on only the two values —b and a. Then, keeping 
in mind this approximation, we can write (10.5.2) as 


(10.5.3) ta(—b, ale + [1 — mu(—b, a)]e = 1 


Multiplying (10.5.3) by e and solving for To(—b, a), we get 


eats — ghu 
(10.5.4) Ta(—b, a) = ee 


Employing the same approximation, we also get 
(10.5.5) Eu(Zn| —b, a) = —bra(—b, a) + afl — To(—b, a)] 
and consequently from (10.4.2) 


(10.5.6) E,(n| —b, aj = EEE Hral a) 
E.(z) 


If E.(z) = 0, then ta = 0, so that (10.5.4) becomes indeterminate. 


However, if we apply L’Hospital’s rule and differentiate the numerator 
and denominator of (10.5.4) with respect to ta and set te = 0, we get 


10.5.7 mabe) = 2 (2) = 
( 7) (=b, a) eR (Z.(2) = 0) 


Also from (10.4.3), the above approximations and (10.5.7) 


(10.5.8) E.(n| —b, a) = = (E.(2) = 0) 


As an illustration of the application of the above formulas, let z be 
a binomial random variable whic ; 


h takes on the value u with probabilit; 
w and —u with probability 1 — w. Then puli) = wet + ci et 


Eol) = uw — 1) and o4? = 4uo(1 — o). Setting a(t) =1 and 
solving for e", we easily obtain ee = (L—«\/" 


Substituting these 


W 


expressions in the approximate formulas, we get for w = 1/2 


Sec. 10.5 APPROXIMATION FOR P,(Z, < —b) AND FOR E,(n) 277 
t ai (a+b) /u C =i blu 
ov ii ry 
—— 
1 —a (a+b) /u 
oi 
w 


a — (a + b)ro(—b, a) 
u(2w — 1) 

ab 
4u?a(1 — w) 


Note that, if u = 1 and a and b are integers, there will be no excess 
over the boundaries, so that formulas 10.5.9 and 10.5.10 are exact. 
These formulas, in fact, represent a solution to the “Gambler’s Ruin” 
problem which has been frequently treated in the literature. Two 
special classes of statistical problems where u = 1 and a and b are in- 
tegers will be considered in the next section. 


(10.5.9) te(—b, a) = 


(10.5.10)  By(n| —b, a) = 


and, for w = 1/2, Bu(n| —b, a) = 


PROBLEMS 


10.5.1. Employing the approximations, obtain expressions for 7.(—b, a) and 
E.(n | —b, a) in case 

(a) g4(z) is normal with unit variance and mean w. . PO 

(b) z = x — u where u is a positive constant and z has a Poisson distribution 
with mean w. i 

10.5.2. In Problem 10.5.1, plot ze(—b, a) and E,(n| —b, a) as functions of w 
fora = 5, b = 10. : 7 Ba 

10.5.3. Let y be a numerical-valued random variable with probability distribu- 


PON 


tion f(y) under the hypothesis Ho (# = 1, 2), and let z = os) Consider a 


sequential-probability ratio test with boundaries A 21, B <1, and let a = log A, 
=b = log B. Let a = 1 — m(—, a) be the error of accepting Hz when H1 is true, 
and 8 = x9(—b, a) be the error of accepting Hı when Hs is true. Employing the 
given approximations, show that 

= B 
1-8 pe 


i =g 


a 
Hint. Let goulz) (w = 1, 2) be the distribution of z under He. Show that te = 1 for 
o = 1 and t = —1 for w = 2. Hence solve for A and B in terms of æ and £ from 
equation 10.5.3. , ETSAI SETENE 
10.5.4. Let z be a numerical random variable with probability distribution f(z) = 
B(w)e**r(z), Show that any sequential-probability ratio test for testing the hy- 
Pothesis H, that w = w against the alternative He that w = w2 is reducible toa 


Sequential test of the type: Accept Hı if a < —b; accept He if 2 zi > a; con- 
t=1 = 


tinue sampling otherwise, where z; = ti — % and v is a constant. 
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10.5.5. Express the risk function p* for a non-truncated sequential dichotomy in 
terms of the approximate values of Eoln | —b, a) and 7.(—b, a) given in (10.5.4) 
and (10.5.6) respectively, and obtain approximate expressions for y and 6 by mini- 
mizing this function with respect to these quantities. 


10.6. The Determination of the Stopping Regions for a Special Class of 
Dichotomies 


We shall consider in this section two cases of dichotomies in which 
the boundaries y and ô [see Section 10.3] can be obtained with relative 
ease without any approximations. 

Case 1. Binomial Dichotomies with Hypotheses Symmetric about 1 72. 
Let y be a binomial random variable which takes on the values 1 and 0 
with respective probabilities pı and qı = 1 — pı under the hypothesis 
Hy, p2 and gz = 1 — pə under the hypothesis Hə. We assume that 
Pi < P2 and that pı + p2 = 1; i.e., pı and pə are symmetric about 1/2. 
Let ¢ be the a priori probability for H,, and let the loss matrix be given 


0 w 
by ( “4 - The problem is to determine y and 6 which define the 
We1 


class of Bayes procedures for testing H, against Ho. 


3. 
Let yi, Y2, +++ be a sequence of observations on y, and let mz = ye 
izi 
Then the likelihood ratio [see (10.3.3)] is given by 
pa”iq "i 2m; fy \5 
(10.6.1) y= = (») ¢ 3) 
Pian pı p2 
so that [see (10.3.5)] 
i Da 
(10.6.2) > zi = 2m; log pa jlog pi 
i=l Pı P2 


Hence, defining ay and —b; as in (10.3.6), we see that the class of Bayes 
procedures can be characterized as follows: Continue sampling as long as 


—b a j 
eee a TOE ewer es 
2 log (p2/p1) 2log (p2/pı) 2 


and terminate sampling and accept the appropriate hypothesis as soon 
as, for some j = n, one of the two inequalities in (10.6.3) does not hold. 

We define x; = 1 if y: = 1, x; = —1 if y; =0. Then (10.6.3) can 
easily be shown to be equivalent to 


(10.6.3) +5 < m; 


br 2 ay 


10.6.4 — < D 2; < —_ + _ 
C ; log (p2/p1) ze “te (p2/p1) 
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N 

Moreover, since when we terminate sampling, the quantity Ds x; is an 
isl 

integer, we might as well take the boundaries in (10.6.4) as integers. 

Thus, if y and 6 are known, the Bayes procedure for a given ¢ is de- 


fined as follows: We set 


fas (d= wey io (1 — aT 
(10.6.5) a"; = e dy P —b* = _ = ps 
uF Log (p2/p1) * Log (p2/p1) 


where the symbol [k] stands for the smallest integer greater than or 
equal to k. We continue sampling as long as —b*; < Lr < a*r. 
ist 
n 
We terminate sampling as soon as for some sample size n either >> 2; 


isl 


n 
= a*, or >) 2; = —b*;. In the former case we accept Ho; in the lat- 
isl 
ter case we accept Hy. 


n 
Let m(—b*;, a*,) = Po = ti = -0), and let Es(n| —b*;, a*;) 
i=l 
be the expected number of observations required to reach a decision 
when H, (w = 1, 2) is true. Then without any approximations [see 


remark at end of Section 10.5] 


(po/piyttt — (p/p) 
(pa/p) "t — 1 


(pi/ po) Pe (pi/, po)’t 
(p/p) t" =1 
a*r — (a*r + b*;)m1(—b*y, a*t) 
Pi — Pi 


(10.6.6)  mı(—b*;, a%) = 


(10.6.7) m2(—b*, a*t) = 


(10.6.8) Ey[n| —b*, a] = 


a*t — (a*r + b*¢)72(—b*, a*t) 
P2 — Pı 


(10.6.9) Ealn| —b*:, a*] 


Now let ¢ = y. Then the Bayes procedure is defined by the bound- 
aries [see (10.6.5)] a* = 0, —b*,. Hence, employing equation 10.3.13 
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and excluding the trivial case b*, = 1, we get 

(10.6.10)  Elp*(P7)] = (1 — v)pelE2(n |—(6* — 1), 1) 

+ warma(— (b*; — 1), 1)] + yiwie(l — pı) + nil (n |—(6*, — 1), 1) 
+ wia{l — 7(—(6*, — 1), 1)}]} 

Similarly, if we set ¢ = 6 in (10.6.5), we get 
(10.6.11)  Elp*(1'6)] = 6p2[#,(n |—1, b*,, — 1) 
+ wie(l — m(—1, b*, — 1))] + (1 — 8){wer(1 — pı) 


+ pilE2(n|—1, b*, — 1) + weym9(—1, b*, — 1))} 
Setting 


(10.6.12) wizy = 1 + Elp*(Ty)], (1 — ô)wzı = 1 + Elo*(T8)] 
we obtain two equations in two unknowns which may be solved in the 


The Bayes Risk as a Function of the a priori Distribution for a Dichotomy with 
Hı: pı = 3, Ho: p = 3, ¢ = 1, w = wy = 38.25 


AO g 


000I y=02 03 04 05 06 0720805 o 
t 


Figure 16 


Minimax strategy for the statistician: A sequential probability ratio test with bound- 
aries (—8, 3) or (—2, 2), or any mixture of the two, or a 50:50 mixture of two se- 
quential probability ratio tests with boundaries (—8, 2) and (—2, 3), respectively. 
Nature’s least favorable distribution: 


f=(,1-H= (0.5, 0.5) 
following manner: Guess an integer b*,. Then the equations in (10.6.12) 


are linear in y and 6. Solve for y and 6, and substitute these values in 
the formula 
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(1 — y)]! 
HTE 
(10.6.13) AS 2 =e 


log (p2/p1) 


If the resulting quantity has a value equal to the guessed b*,, the com- 
puted y and 6 are correct. If not, repeat the process. 
The risk from the optimum sequential procedure under consideration 


The Bayes Risk as a Function of the a priori Distribution for a Dichotomy with 
Hı: py = 3, He: pe = $, ¢ = 1, wie = 50, wn = 100 


16 


14 


es) 8 


°o a1 Ja2 03 04 05 06 07 08 09 10 
y= 0.17045 t ô= 0.92990 


Figure 16 
Minimax strategy for the statistician: A mixture of two sequential probability ratio 


tests, one with boundaries (—4, 3), the other with boundaries (—4, 4), in the pro- 
Portions 0.58682 and 0.41318, respectively. Nature’s least favorable distribution: 


Ẹ = (t, 1 — t) = (0.62175, 0.37825) 

Can be computed as a function of ¢ as soon as y and 6 are known and 
18 given by 
(10.6.14) e*(t) = (By (n|—b*,, a*) + (1 — OE(n |—b*;, a) 

+ fwyell — mi(—b*s, a*)] + (1 — Qwarte(—b%, a*r) 
Where for a, fixed t, a*; and b*; are given by (10.6.5). Since a*r and b*; 
are integers, the curve obtained by plotting p*(¢) against ¢ will consist 
°F connected line segments. This is illustrated in Figures 15 and 16. 


_ Case 2. Double Dichotomies. We are given two binomial popula- 
tions Tl; and Up, defined by two parameters pı and pz where p, = 
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Po (y = 1), qo = L — Po = Po (y = 0) for w = 1,2 and pı < po. Let 
H, stand for the hypothesis that pı is associated with Il, and pə is 
associated with Iz, and let Hə stand for the hypothesis that pz is asso- 
ciated with II, and pı is associated with Iz. Let wis be the loss in ac- 
cepting Hə when H; is true and wo; be the loss in accepting Hı when 
Hy is true. Let ¢ be the a priori probability that Hy is true. The prob- 
lem is to determine y and 6 which characterize the Bayes procedures 
for testing these hypotheses. 

We shall assume that the sequence of observations will be taken in 
pairs, one from II; and one from I, and we shall designate this se- 
quence by (y11, Y21), (V12, Y22), «++ where Vi; (i = 1, 2; j = 1, 2, vee) 
takes on the value 0 or 1. If the above assumption is made, the steps 
required to compute y and 6 parallel those of case 1, provided we 
define x; = yo: — Yı: and substitute in each formula, beginning with 
(10.6.5), p19 for pı and poq; for po. The verification of this fact is left 
to the reader [see Problem 10.6.1]. Note that, for each 7, the random 
variable z; = yo: — yı; takes on the values —1, 0, 1, with respective 
probabilities g2p1, pip2 + 4192, Pog under Hy; and qipe, pipe + G92, 
pıq2 under Hə. Thus z; = 0 has the same probability of occurring 
under either hypothesis and consequently yields no useful information 
for discriminating between them. 


10.7. Examples of Trichotomies, i.e., @ = (1, 2, 3), A = (1, 2, 3) 


Example 1. Assume that the random variables y1, y2, --- are inde- 
pendently distributed and all have the same distribution. Each yn 
takes on the values 1, 2, 3 with probabilities specified by one of the fol- 
lowing alternative hypotheses: 


States of veng 
\f 
Nature 1 2 3 
Hı 0 1/2 1/2 
He 1/2 0 1/2 
H; 1/2 1/2 0 


Let w;; be the loss if H; is accepted when H; is true; the values are 
given by the following table. 
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States of Decisions 
Nature H, Hs H, 
Hı 0 4 6 
H: 6 0 6 
H; 4 6 0 


Note that both of these matrices are invariant under a cyclic permuta- 


tion of the hypotheses and events. 

Let (i) be the a priori probability of H;. As we have seen, a point 
È = (E(1), £(2), £(8)) e= with (1) + (2) + ¿(8) = 1, may be repre- 
sented by a point in an equilateral triangle with unit altitudes [see Sec- 
tion 6.3], the distance from the point to the three sides being the values 
&(1), £(2), and £(3). Let P; be the point where &(?) = 1 (i = 1, 2, 3). 

Let p(Ẹ, $) be the average risk under a non-truncated sequential 
Procedure § when the a priori distribution is. Let S*o be the optimal 
Sequential procedure where no observations are taken and let T(wi) be 


the region in = where H; is accepted under S*o. Let §-w; = 2) w(i) 
i=] 


be the risk in accepting H; without experimentation when the a priori 
distribution is £. Then T(wı) is defined by ¥(§) = min §-w; = §-w,, 
le., by the inequalities 


(10.7.1) E-w, < &-wo, Ewi < §-ws 
or 
(10.7.2) 6E(2) + 4€(8) < 4€(1) + 6E(8) 


6E(2) + 4&(8) < 6&(1) + 4£(2) 
That is 


(10.7.3) &(1) > max GEO) — 408), 382) + 34(3)) 


At P, £(2) = £(3) = 0, £(1) = 1, so that Pi e T(w,). When ¿(3) = 0, 
(10.7.3) becomes as > 3(2), while (1) + £(2) = 1, so that (1) >3, 
When £(2) = 0, £(1) > 2£(), &() + £@) = L a es in 2 tA 

ot r ing lines for ¿(1) are equal when 2&(2) — 3 2 

wo lower bounding lines for (1) ec zO = 


882) + 3208) = e(1), or (1) = #2) = £8) = 3 
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all points above the boundary defined by the polygon with vertices 
(3, 3, 0), (2, 3, 3), and (3, 0, $). T(wə) and T(w3) can be obtained by 
successive cyclic permutation of the coordinates. [See Figure 17.] 


R 


P B 
Figure 17 


Let 8*(§) be the optimal non-truncated sequential procedure for a 
given a priori distribution Ẹ, and let £;(7) be the a posteriori probability 
of H; given that yı = j where 


(10.7.4) s= e 


D prt(k) 
k=1 


and where p,; is the probability that y; = j under H; (note that here 
the subscript j in £;(2) designates the value of the first observation and 
not the number of coordinates observed). Let 8*,(E) be the sequential 
procedure defined as taking one observation and then using the pro- 
cedure 8*(&;) based on £; = (Ł;(1), &(2), £(3)) when yı = j. $*,(E) is 
the best sequential procedure involving taking at least one observation. 
In the present case, pj; = 0, so that (j) =0 by (10.7.4); therefore 
8*(§,) is the optimal sequential test for a dichotomy. 

Let A(w;) be the region in £ for which H; is accepted without any 
observations under 8*(§). As has been shown, the regions A(w;) are 
all that is needed to determine all the tests $*(Ẹ). They are convex, 
and their boundaries are characterized by the relations 


(10.7.5) ¥(E) = 1 + Elp*(TE)] = p*(&, 8*,(E)) 
so that 


(10.7.6) A(w:) C T(w,) 
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It is first necessary to find the optimal tests for each of the dichoto- 
mies formed by taking pairs from the trichotomy Hy, Hə, Hz. Consider 
the dichotomy H,, Hə. Then &(1) + &(2) = 1. Suppose é(1) is such 
that it pays to take at least one observation. From (10.7.4), since 
Pig = 1/2 (= 1, 2), 


(10.7.7) gli) = 


KD 
E) + £2) 


Hence, if y, = 3, the a posteriori probabilities are unchanged, and, as 
shown previously, the optimal test calls for taking another observa- 
tion. On the other hand, pi = 0, so that &(2) = 1. Therefore, if 
Yı = 1, the process should be terminated and Hə accepted. Similarly, 
if yı = 2, the process should be terminated and Hı accepted. It fol- 
lows that, if £(1) is such that the best sequential procedure calls for 
taking at least one observation, then the best procedure is to sample 
indefinitely until yn = 1 or 2; in the former case, accept Hə; in the 
latter, M. The probability of accepting the wrong hypothesis is zero 
under either hypothesis; the risk is then the expected number of obser- 
vations, which is 2 under either hypothesis. 

The boundaries 712 and 612 of the interval of £(1)’s in which it pays 


to go on are then determined by the equation, 


10) 


(10.7.8) wy =2 or y= 1/2 
wa = 612) =2 or ôg = 2/3 
Returning to the specification of S*,(£), we note that, if yı = 3, 
(1) pis 
3 
Dd Eh) pes 


k=l 


(i) = 


AS p33 = 0, piz = 1/2 fori = 1, 2, 


W @=1,2) 
EC) + &(2) 
ed as follows: If y1 = 3, H3 is rejected 
i a e ept Hə; if E(1)/E01) 
entirely; if £(1)/(E(1) + £(2)) < 1/2, stop and accept 
+ &2))'> 2/3, stop and accept Hy; if 1/2 < £0)/GU) + £2) < 2/3, 
Continue sampling until yn = 1 or 2; at which point stop and accept 
2 or Hy, respectively. The three conditions on &(1)/(E(1) + £(2)) can 
be written in the simpler form, 
(10.7.9) ga) <e) ED 2AE) E2) <E) < 8) 


f(z) = 


Therefore $*,(&) can be describ 
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respectively. The cases where y; = 1 or 2 can be obtained from the 
preceding case by cyclic permutation of the numbers 1, 2, 3. 

Let p*,(&) be the conditional expected risk associated with 8*,(§) 
when the a priori probability distribution is Ẹ, given that yı = j; and 
let p: be the a priori probability that yı = j. Then 


3 
(10.7.10) p* (E, S*1€)) = © rip*s) 
j=l 


3 
(10.7.11) pj = DX patli) 
i=l 


4&(1) 
&(1) + &(2) 
=3 if &2) <&(1) < 2&2) 


L+ C) if (1) > 2&(2) 
= ——— i 
&(1) + &(2) B 


p*ı(È) and p*2(€) can be obtained from (10.7.12) by cyclic permutation 
of the subscripts. 

The region A(w,) can now be determined by relations 10.7.5 and 
10.7.10 to 10.7.12. First note that, when (3) = 0, the problem re- 
duces to the dichotomy between H, and Ho already discussed, so that 
the interval from (1) = 1 to &(1) = 2/3 on the line £(3) = 0 belongs 


to A(w,). Hence the point (2/3, 1/3, 0) lies on the boundary of A(w;). 
This point satisfies the conditions, 


(10.7.13) (2) < (1) < 2&2), —-E(2) > 2£(3), E(B) < EC) 


Consider the intersection, if any, of the boundary of A(w,) with the 
region A; defined by (10.7.13). Using (10.7.10-11) and (10.7.5), 


1 2 2 3 1 3 
BSE PPIE Ez PTE L 


ll 


(10.7.12) p*3(8) if &(1) < (2) 


p*o(E) = §-w 
From above, (10.7.12) and (10.7.2), we obtain 
3 &(1) + &(2) 4 &(2) + £(3) (: 6£(3) ) 

2 2 £(2) + £(3) 


&(1) + £8) 4&(3) 
1 
j 2 ( j &(1) + £(3) 


) 6&(2) + 483) 
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or 
(10.7.14) £2) = 1/3 


The intersection of (10.7.14) with the line £(2) = 2&(8) occurs at the 
point (1/2, 1/3, 1/6), which satisfies the conditions (10.7.13) and so 
lies on the boundary of the region 4;. As 41 is convex, it follows that 
the boundary of A(w,) actually does intersect A; and there coincides 


Boundaries of Regions Determining the Optimum Test for a Trichotomy: 


y=1,2,3 0 4 6 
Hy: p=90, 2,2 7=16 
0, 22 w=(6 0 4 
Ho: p= 2,0,3 46 0 
Hs: p =%, 2,0 


A 


Ps 


(0,22) 
Figure 18 


ast favorable to the statistician: t= 


With the line segment joining (2/3, 1/3, 0) and (1/2, 1/3, 1/6). The 
latter point satisfies also the conditions 
£3) < £2) < 9¢(3), E8) < +0) 


P, (0,5,3) 


» ae E ` 
G33) 


Nature’s strategy le: 


(10.7.15) (2) < (1) < 2&2), 


Let Ay be the region defined by (10.7.15). Then we cas find aa Before 
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the intersection of the boundary of A(w,) with A2; the boundary hits 
the line £(2) = £(3) at the point (3/7, 2/7, 2/7), which point lies in Ag. 
Hence the boundary of A(w,) actually does intersect Ao and there coin- 
cides with the segment joining (1/2, 1/3, 1/6) to (3/7, 2/7, 2/7). If 
we continue this method, it can be shown that A(w,) is bounded by 
the polygon with vertices (2/3, 1/3, 0), (1/2, 1/3, 1/6), (8/7, 2/7, 2/7), 
(2/5, 1/5, 2/5), (1/2, 0, 1/2), and (1, 0, 0). It is easily verified that 
A(wı) actually is a subset of T(w,) as demanded by (10.7.6). The ver- 
tices of the polygons bounding A(w) and A(w3) can be obtained by 
cyclic permutation of the coordinates. [See Figure 18.] 


R 


P, 


B 


(053) (43) (043 
Figure 19 


Example 2. The boundaries of the regions A(w;) might also be 
straight lines, as shown by the following example: Let 21, %, «= be 
independently distributed with the same distribution, and x, takes on 
the values 1, 2, or 3. Let H; be the hypothesis that x, = i with proba- 
bility 1 (i = 1, 2, 3), and let wij = 3 for i = j, wz = 0. Assume the 
cost of each observation is 1. Then the best test involving at least one 
observation is clearly to take exactly one observation and accept H; 
if xı = i. The expected loss due to incorrect decision is 0, so that the 
risk of this test is 1. Hence the boundary of A(w,) is characterized by 
the relation, w£(2) + wsi£(3) = 3(E(2) + £(3)) = 1, or gay = 2/8: 
Similarly, A(we), A(ws) are defined by the inequalities £(2) > 2/3, 
(8) > 2/3, respectively. 
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If no observations are taken, the region T(w;) in which H; is accepted 
is characterized by wo1£(2) + waig(3) < min (wi2(1) + ws24(3), wisé(1) 
+ wes(2)) or 1 — (1) < min (E(1) + £(8), (1) + £(2)), 2) > max 
(1 — £(2), 1 — £(8)). 

The boundary is the polygonal line with vertices (1/2, 1/2, 0), (1/3, 
1/3, 1/3), and (1/2, 0, 1/2). The boundaries of T(w2) and T(ws) are 
found similarly. The regions A(w;), A(w2), A(ws) clearly lie inside 
T(w,), T(we), T(ws), respectively. [See Figure 19.] 


P 


(4,0,4) 


(44,0) 


P 


0,44) 


Figure 20 
nples, inner boundaries of the re- 


Example 3. revious exar ) 
A — he risk of accepting H; if no ob- 


gions A(w,) were found by equating t i 
servations are taken with the risk under the best procedure calling for 


taking at least one observation. The regions so found were in ny 
Cases subsets of T(w;). However, this relation need not hold in general, 
as shown by the following example. 
Let all conditions be the same as in 
for i se j. Then the risk of accepting Hı 
to the risk of the best test, taking at least on 


Example 2 except that wij = 5/3 
without observations is equal 
e observation when £(1) 


è 5 ee here as in Example or a 
nd the region £(1) > 2/5. A(wi) is boun 

Onal line with vertices (1/2, 1/2, 0), (2/5, 2/5, 1/5), (2/5, 1/5, 2/5), 
and (1/2, 0, 1/2). [See Figure 20.] 


290 SEQUENTIAL PROCEDURES FOR 2 AND A FINITE Ch. 10 


PROBLEM 


10.7.1. Obtain the stopping regions for the sequential trichotomy given in Figure 


0 6 6 
18 with W = (s 0 s): 


6 6 Q 


10.8. Minimax Strategies in Sequential Games with Finite Q and A 


We have shown that when both Q and A are finite there exists a se- 
quential-sampling plan and a terminal-decision procedure which mini- 
mizes the average risk for every £ This means that, for every mixed 
strategy of nature, the statistician has a pure strategy which is optimal 
against it. Thus, if nature’s mixed strategy were known, the statisti- 
cian’s problem would be solved, except for the problem of character- 
izing the optimal sequential procedure $*. But often nature’s mixed 
strategy is unknown. In this case, the minimax strategy might be 
employed; that is, the statistician might select that decision procedure 
which minimizes the maximum risk. Since the Bayes risk p*(£) is a 
concave and continuous function of £ defined over a closed and convex 
set = in a finite dimensional Cartesian space, it will attain a maximum 
at some point = * eZ. Hence ¢* will be nature’s least favorable 
distribution. If the Bayes sequential procedure against &* requires at 
least one observation and if fẹ(x;) is a density for each w, then this 
Bayes procedure, which is a pure strategy for the statistician, will in 
fact be a minimax strategy. This, however, no longer holds if fe (a;) is a 
discrete probability distribution, and in this case the statistician may 
have to resort to mixed strategies. 

As an illustration consider the dichotomy discussed in Section 10.6. 
Suppose we have obtained y and 6 and found that p*(£) given in (10.6.14) 
has a maximum at ¢ = ¢* and ¢* does not belong to a stopping region. 
Then in order that the sequential test $* (defined by the boundaries 
—b*;» and a*;») be independent of ¢, which is required here for a mini- 
max solution, we must have 


(10.8.1) Ey[n|—b*p, a*] + wie — m1(—b* ps, a*ps)] 
= Fon |— d*s, a*t) + Wworma(—b*p, a*r) 


But, since a*;» and b*,+ are necessarily integers, this equation will usu- 
ally not be satisfied. This implies that nature’s strategy ¢* must have 
the property that when we set ¢ = ¢* in (10.6.5) (removing the braces) 
either a*; is exactly an integer, or b*;« is exactly an integer, or both are 
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integers. Suppose for example that a*;* is an integer but not b*;+. 
n 


This means that when >> x; = a*;» we have two courses of action to 
i=l 

follow. We can either stop and accept Hə or go on experimenting with 
the best sequential procedure defined by boundaries a = a*;« + 1, —b 
= —b*;s. But these two procedures will not always have the same risk 
for ¢ = ¢*. Thus, in order to make the risk independent of ¢ we shall 
have to employ a mixed strategy. That is, we shall have to employ one 
procedure some specified fraction « of the time and the other procedure, 
1 — æ of the time, where 0 < œ < 1. The above is illustrated in Figures 
15 and 16. 

Another remark which might be of interest is that the minimax se- 
quential procedure even in relatively simple situations need not be 
unique as is illustrated in the problem dealt with in Figure 15. As an- 
other illustration, consider the trichotomy discussed in Section 10.7. 
Here the minimax procedure is to stop with one observation and ac- 
cept Hy if yı = 1, He if yı = 2, and Hı if yı = 3. This can be seen 
from the following considerations. 

By (10.7.12) the maximum conditional risk, given that yı = 1, is 3, 
and it occurs when £(3) < £(2) < 2&(8). Similarly, the maximum con- 
ditional expected risk, given that yı = 2 and 3, respectively, are both 
equal to 3, and they occur when £(1) < 2¢(2), &(2) < &(1) < 2(2), re- 
spectively, Any Ẹ satisfying these three conditions will be a least favor- 
able a priori distribution, and, clearly, the only set of values is &(1) = 
&(2) = (3) = 1/3. If yı = 1, the corresponding a posteriori distribu- 
tion is (0, 1/2, 1/2). This is the boundary of A(ws). In general, then, 
the minimax procedure is to take one observation, stop, and accept Hs 
if y = 1, Hy if y, = 2, and He if yı = 3. The risk associated with 
this test is 3, of which the cost of observation is 1, and the expected 
loss due to incorrect decision is 2, independent of the true a priori dis- 
tribution. ; 

The above minimax test, however, is not unique, the lack of unique- 
ness corresponding exactly to the inclusion or exclusion of the boundaries 
of A(w,), A(we), and A(ws) in those sets. If we exclude the boundaries, 
then, when yı = 1, we continue. So long as y; = 1, the a posteriori 
Probabilities remain at (0, 1/2, 1/2): when yj = 2(3), the a posteriori 
Probability of H3(H2) becomes unity. Therefore, a second minimax 
Procedure is to stop the first time yn # Yn—ı and then accept that hy- 
Pothesis whose subscript equals neither Yn nOr Yn—1- The maximum 
risk is again 3; all of this is represented by the expected number of ob- 
Servations which is the same for all a priori distributions. 
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10.9. Another Optimal Property of the Sequential-Probability 
Ratio Test 


In Section 10.3 we have shown that the sequential-probability ratio 
test is optimal in the sense that for a given a priori distribution E= 
(¢, 1 — §) it minimizes the average risk. We shall now prove that for 
a fixed probability of making erroneous decisions, this test minimizes 


the average number of observations when H 1 is true as well as when 
H; is true. 


Theorem 10.9.1. Leta sequential-probability ratio test $* be defined 
by the boundaries A > 1 and B < 1, and let æ be the probability of ac- 
cepting Hz when H; is true, 8 the probability of accepting Hı when He 
is true, when $* is employed: Let $ be any other test procedure that re- 
sults in probabilities a’ < a, 6’ <B of making erroneous decisions. 
Then 
(10.9.1) 
and 


(10.9.2) 


Eln | 8*; Hy] < E(a | S; Hy) 


E(n | 8*; He) < E(n| 8; He) 
where the symbols in (10.9.1 
ber of observations re 
hypothesis. 


) and (10.9.2) represent the expected num- 
quired under the specified decision procedure and 


Proof. Choose any § = (¢, 1 — 5) such that 0 < ¢ <1, and solve 
for y and ô from (10.3.4) in terms of A and B. Then since the quantity 
by entering in (10.3.13) and (10.3.15) is equal to log (A/B) [see (10.3.4)] 
we can solve for wis = u(y, ô) and we; = u(y, ô). 

The three quantities wily, 6), uer(y, 6), and È have the property 
that, if the a priori distribution is € and if the risk of accepting Ha when 


A, is true is u(y, 8) and the risk of accepting H, when He is true is 
uai(y, 6), then S* has minimum average risk. The average risk under 
8* is given by 


(1) 0G, 8*) = tEn | s*; H) + 1 — )E(n | 8*; H3) 
+ Surely, da + (1 — duos (y, ôB 


Similarly, the average risk under $ for the same triple, &, 


uar(y, ô) is uzly, ô), 
©) ofG S) = | 8; m) + HR] s; Ha) 


+ Surely, da’ + (1 — Juary, ôB’ 
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Now since p(E, $*) < p(È, S) we must have 
(8) fB(a| 8*; Hy) + (1 — DEQ | 8*; He) < Eln | 8; Hh) 

+ (1 = )E@| 8; He) 


But inequality 3 must hold for all values of ¢,0 < ¢ < 1. Hence, from 
continuity considerations we must have 


(4) E(n| s*; Hı) < E(n|s; Hı) and E(n| $*; Ho) < E(n | 8; He) 


This proves the theorem. 


CHAPTER 11 
Estimation 


11.1. Introduction 


A statistical-estimation problem with a fixed sample-size experiment 
has the following structure: We are given a sample space Z = (Z, Q, p) 
and a numerical-valued function @ defined on Q whos 
tician wishes to estimate on the basis of the outcom 
zeZ. (For the sake of simplicity we restrict oursely 
tion of numerical-valued functions 9. Many of the 
ter carry over easily to the case where @ is vector-v 
domized decision function for the statistician, us 
mate, is a numerical function ¢ defined on Z, specifying for each z the 
number a e A which will be chosen to estimate 0() when that z is ob- 
served. The space of actions A is here the real line. The loss function 
L defined on Q X A is non-negative and for many estimation problems 
is usually bounded only from below. Thus the only functions ¢ that 


can be considered as possible estimates for 0 are those for which the 
risk function p with 


e value the statis- 
e of an experiment 
es to the considera- 
results of this chap- 
alued.) A non-ran- 
ually called an esti- 


p(w, t) = x Llo, t(2))p(z | w) 


is finite. Another restriction which is 
is often not unreasonable is that, for an 
convex in a and further that, as a > 
This assumption has two interesting ¢ 
statistician never needs to consider randomized estimates. The other 
is that any estimate u which is not a function of a sufficient statistic t 
is inadmissible, and that v = Eslu |t) (which is independent of w [see 
Definition 8.2.1 and Definition 8.2.2]) 


: Constitutes an improvement. 
These facts are incorporated in the following two theorems. 

Theorem 11.1.1. In any estimation pro 
fixed w convex in a and —> + 


is complete: In fact, if ņ is a 


usually placed on L and which 
Y v £9, L(w, a) is assumed to be 
+o, L(w, a) => +o for all w. 
onsequences. One is that the 


See. 11.1 INTRODUCTION 295 
mates t;, fg, +-+, with probabilities \1, A2, ---, the pure estimate 
#@ = DAs) 
i 


is superior, i.e., 
p(w, t*) < p*(@, 2) 
for all w. 


Proof. 
lo, 2) = E UAL, E)E |o) 


= x ple |o) [= Llo, se| 


Now, by Theorem 2.2.8, t*(z) exists since by assumption the series 
By Llw, t;(z)) converges for all z for which p(z |) > 0, and 
2 


E ajL, t;2)) = Llo, “@) 
i 


These two facts prove the theorem. 


Theorem 11.1.2. Consider an estimation problem where Llo, a) is 
for fixed w convex in a. Then for any statistic t which is sufficient on 
Z=(Z, Q, p) and any estimate u of the function 0, the estimate v = 
E.(u| t) which is a function of t only, has the property that 


(i) p(w, v) < p(w, u) 


forall w. If L(%w, a) is for a given w strictly convex in 4, actual inequality 
holds in (i) for that w unless u is a function of t, i.e., unless v and u 
coincide, 

Proof. To prove this theorem we first restate Theorem 2.2.8 as a 
emma involving conditional expectations. 

Lemma 11.1.1. For any real-valued random variable y, any conyex 
unction hk, and any random variable t, 


Eho) | > MB | OI 


for all. If h is strictly convex, actual inequality holds unless y is con- 


Stant for the given u. 
We now return to the proof of the theorem. For fixed w 


pe, u) = Balbo, Wh, Plo ») = Balle, o) 
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Application of Lemma 11.1.1 with y = u and 


(2) hlu) = Le, u) 
yields 
(3) E[L(w, u) | d > Llo, v()) 


for all ¢. Multiplying both sides of (3) by p(t| w) and summing with 
respect to ¢ yields 


(4) plo, u) > p(w, v) 
If L(w, a) is strictly convex in a, we have actual inequality in (3) unless 

= v for the given ¢, so that we have inequality in (4) unless u = v for 
all ¢. 


The remainder of this chapter will deal with several classes of prob- 
lems in estimation where some general results are available. It is a 
fact that thus far only the framework of the general theory of estima- 
tion has been established. Specific applications to many general classes 
of problems await further research. This is particularly true in the 
case of sequential-estimation procedures. 


In the sequential games considered in Chapters 9 and 10, a decision 
proced i i 


or of fixed sample size, 
the other hand, present n iti 


uces the choice of d to the solution of fixed 
sample-size problems: for any £, the Bayes estimate d = a(j, x) is for 
each j the Bayes estimate t;(x) against £ in the fixed sample-size estimation 
problem in which x, -. *, ©} are observed. Thus, for estimation problems 
with a prescribed $, which are of some interest, the Bayes solutions are 
easily described. These re 


marks are illustrated in several of the prob- 
lems in this chapter. 
PROBLEMS 
ULL Ife = (on, o i ZN), where zi, ++-, ry are independent with py {xi = J} 
= woj =1, ++, “w = 1, 2, +++, and L(w, a) = (a — «)?, show that 


pagli Petey 
N 
is not an admissible estimate for w. 


11.1.2, If t is admissible in a given estimation problem, it remains admissible if 
L(w, a) is replaced by Li(w, a) = A@)L(w, a), Mw) > 0, 
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11.2. Bayes Estimates for Special Loss Functions 


We are given a sample space Z = (Z, 2, p) a numerical-valued func- 
tion @ on Q, and a priori distribution £ on Q, and a space of actions A 
which is the real line. The problem is to find a Bayes estimate of 0 on 
the basis of the outcome of an experiment z £ Z, i.e., to find a real-valued 
function ¢ on Z which minimizes the average risk where for each w eQ 
and each a e A the loss is given by L(, a). 

The solution to the above problem is as follows: Let 


(11.2.1) Pie) = È peloto) 
wel 


and let r,(é) be the a posteriori risk, given that the estimating procedure 
tis used and z is observed. (Note that we do not exhibit the dependence 
of the function 7, on £.) Then it follows from Definition 3.6.5 that, if 
P(E, t) is the risk from ¢, 

(11.2.2) plt, t) = © 72(t(2)) Pe) 

Although in this case the function L is not necessarily bounded, the 
arguments of Theorems 6.2.2 and 9.2.1 still apply to show that, for a 
given z and £, the Bayes estimate is obtained by choosing that value ¢ 
Such that 


(11.2.3) t: = inf 72(a) 


a minimum in A, then a Bayes estimate will 
actually exist. This situation arises very frequently in many statistical 
Problems. It may happen that the right-hand side of (11.2.3) is infi- 
nite for some z for which Pz(z) > 0, in which case no estimating pro- 
Cedure of finite risk exists. In most cases of interest, however, the Bayes 
risk is finite, 


The above remarks together with the pertin i i l 
Chapter 6 concerning the class ® of Bayes solutions furnish a fairly 
Complete description of Bayes estimates when considered from a gen- 
eral point of view. However, specific properties and characterizations 
of Such estimates are generally difficult to obtain except for relatively 
Simple types of loss functions. In this section we shall consider 3 types 


of such loss functions which are of theoretical and practical interest. 
e shall also consider a prediction problem which can be cast into the 


ramework of a Bayes estimation problem. 
1. Bayes Estimates for Quadratic Loss Functions. 


(11.2.4) Llo, a) = Mw)[6(e) — as Mw) > 0 


If r.(a) has, for a given 2, 


ent theorems proved in 


Let 
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For any function y on 9 and any eZ we shall denote by Ely | z] 
the conditional expected value of y, given z. Thus fixing a and setting 
y = L, in (11.2.4), we get 


(11.2.5) - 72(a) = E0 — a)? | 2) 


Theorem 11.2.1. For each z, Tala) < œ either for no a, for exactly 
one a, or for every ae A. The second case implies that F;fà | 2] = « 
and the third that Ex{\ | 2] < o. 


Proof. We first show that, if 7-(a) is finite for two distinct points, 
Say, @1, 2 with a, < az in A, then it is finite for allae A. 


The function A(0 — a)? is convex in a so that, for all a in the closed 
interval (a, a2) and all w, 


(1) MDO) — a? < ETE olo) a 
ag — ay 


q holo) — asl? 
a2 — ay 


and hence, for all ae (a1, a2), r-(a) < œ since 


a2—a a—a 
(2) 7:(a) < 72(a1) + — ;,(a) 
ag — ay a2 — ay 


Now let b e (a, as), an 
Then from the identity 


6) @- 0 =@-c+e—92 


d let c be any arbitrary point in A with c = b. 


= @ = 0? + 26 — e)(e — 0) + (e — 0)? 
we readily obtain 


(4) (© — 0)? +26 — c) (e — 9) < (b — 0)? 
and consequently we conclude that 

(5) PIN- 0)? + 206 — ele ~ 6)} | a] < co 
Setting c = 0, we get 

(6) 


Bel {br — 26n} |z] < œ% 
Since b is any arbitrary point in (a1, a2) i 
» 42) it follows that F [\ | 2] < © 
and therefore Elo | 2] < o The finiteness of Eie? | 2] a follows 
ae the eee of 72(a,). Thus 72(@) is finite for all aecA 
e next show that if 7,(a ) < œ and F, i 
Epir 1 HA | 2] <0 then 7,(a) < © 
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Since \(w) > 0, Schwarz’ inequality yields 

(7) [EDO — a1) | al] < Ealo- a || 2] = IVNN- a |) |e] 
< [BA | PQ] 6 — a ? | 2)” 
= [EA | 2))"lr-(@]4 < © 

But, since 

(8) Eno | 2] = Eefayd + A — ay) | 2] 

we have by (7) and the assumption of the finiteness of ZA | 2] that 


E,{0 |z] < ©. The finiteness of E,[\0? | 2] follows from the finiteness 


of 7(a,;). Thus 7,(a) < © for all a £ A. 
Finally, if r(a) < œ for all ae A then Ey |2] < œ. This follows 


since in that case (6) holds for all b e A. 
From Theorem 11.2.1 we see that if 7,(a) exists for two values of 
ae A, or, equivalently, if r-(a) exists for one value of a and H[à | 2] < %, 


then 7.(a) exists for all a, and, moreover, 
(9) Elo? | 2] < @, i=0,1,2 
Hence 7.(a) can be written as 
(10) rala) = a Eal | 2] — 2a E0 | 2] + Eero? | 2] 
which for a given z is a quadratic function in a and therefore has a mini- 
mum at a* = é(z) where 


Efo | 2] 
(11) TE- 


ER | 2] 


We thus have the following. 

Theorem 11.2.2. In any estimation problem where the loss is Ear 
by (11.2.4), if the a priori distribution £ admits any estimate e ite 
expected risk then it admits a Bayes estimate, and essentially only one, 


defined thus: 


t*(z) =b if rb) <% onlyfor bed 


= Eelo Rl | a if (a) <o forall aed 
ED | 2] 
; i by (11.2.4). Then, 
Theorem 11.2.3. Let the loss function be given ‘ : 
if for a given a priori distribution £ which admits an nami oe 
risk the expectation of the function à exists, then the a*i ee 
is biased G.e., the expectation of ¢* is not identically equal to 9) or the 


ayes risk is zero. 
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Proof. Let E:(A) be the expected value of \ with respect to an 
a priori distribution £. Then the existence of (A) implies the exist- 
ence of E:(\ | 2) for all z for which P;(z) > 0 as can be seen from the 
equation 


a) EN = E EA | 2) Pe(2) 
We define 

_ Mw)Ew) 
(2) hlo) = EQ) W 


Then the Bayes estimate with respect to the loss function (11.2.4) and 
£ is the same as the Bayes estimate with respect to the loss function Lt 
defined by 


(3) L*(w, a) = [0(w) — a}? 


and the a priori distribution &. Since £ and hence & admits an esti- 
mate of finite risk, then the Bayes estimate with respect to & exists 
and is given by 


(4) t*(2) = Ex (0| 2) = Eg (0 | t*) 
by Problem 3.11.6. Now suppose ¢* is unbiased; i.e., 
(5) E(t | w) = blo) 


for allw. Then 

(6) B(@t*) = EEG | z)] = BP) = EPE | w)] = BC) 
from which it follows that 

(7) HNL 2 [0(w) — t*(z)Pp(z | otl) = 0 


The assumption of the existence of the second moments of 0 and ¢* 
which is implied in the above proof can be removed. 

If £;(A) does not exist, then Theorem 11.2.3 no longer holds, as ca” 
be seen from the following example: 

It is desired to obtain a Bayes estimate of a binomial probability & 
of a success, on the basis of N independent trials when the a priori dis- 
tribution & is rectangular over the interval (0, 1) and the loss function 
L is given by 


Here 
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so that fro dw = æ. On the other hand, if m represents the num- 
ber of saguesses in the N trials, then, by Theorem 11.2.2, the Bayes 
estimate *(m) if m # 0 or 1 is given by 
1m tly = oy 
0 oll — v) 
1o”(1 — am 
0 w(1 — w) 
T(m + IW — m) T(N) m 
= T(N + 1) T(m)I(N — m) N 


which is unbiased. For this example, it is also easily verified that 


— af 
E [£ a) 
w(1 — w) 
m = 0, and only one value of a, viz. a = 1, when m = N, so that 
i*(0) = 0 and ¢*(N) = 1. 
2. Bayes Estimates Where the Loss Depends on the Absolute Error, 
i.e., 
(11.2.6) Llo, a) = d(w)| Ow) — a, Aw) > 0 
As in the case of the quadratic risk function, let r:(a) be the condi- 
tional expected value of the above loss function with respect to a fixed 
a priori distribution ¢ when z is observed. Then 


(11.2.7) re(a) = Eo) 6) — a | | 2] 


t*(m) = 


m| exists for only one value of a, viz. a = 0, when 


As in Theorem 11.2.2, we have, 

Theorem 11.2.4. For each z, 7:(a) < % either for no a, for exactly 
one a, or for every as A. The second case implies E;(\ | z) = œ, and 
the third case implies Ez( |2) < %. 

We first show that, if r(a) < © for a; and a with a, < az, then 
Ex(\| 2) <œ. This follows since 


(1) Mla) (a2 — a1) < Mw)(| a2 — Olo) | +] 4) = ar |) 
and hence 
(2) (az — a) EA | 2] < 72(@2) + T2(a1) < © 


Now consider any b e A, and let a = a, — b. Then 


6) A(w)| Olo) — a | < A)| Ow) — ar | Hao) | 
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and hence, since by (2), F(A | z) < ~, we have 
(4) 72(a) < T2(@1) + | b [EA | z) <a 


and, since b is arbitrary, we see that r-(a) < % for all ae A which proves 
the theorem. 

We see from the above that, if for a given z, 7,(a) is finite for two 
points of A, or, what is equivalent, if 7.(a) is finite for any one value 
of a and F(A | z) < ©, then 7,(a) < © for allae A. This fact will per- 
mit us to give a complete characterization of the Bayes solutions for 
the loss function under consideration. However, before we do this, we 
require a preliminary definition and theorem. 


Definition 11.2.1. Let p(x) be the distribution of a real-valued ran- 
dom variable, let A be the real line, let 


m, = inf [P(x < a) > 1/2] 


aed 


m = sup [P(x > a) > 1/2] 
aeA 


and let I be the closed interval (m1, m2). Then any me T is called a 
median of the distribution p(z). 
Lemma 11.2.1. Let m be a median of p(x), and let 
g(a) = E| z — a| = Z | z —alp(x) 
exist for at least one ae A (and hence for all a). Then g(a) is mini- 
mized by a = m. 
Proof. Letce AandcgI. We shall show that 
a) ¥(c) = Elle -—c| —|x—ml]>0 
Assume for the sake of argument that c > m. Then 
(2) ¥(c) = = (c — m)p(x) + 2 (m — c)p(2) 
+ DY (+m — 2x)p(x) 
m<r <c 
Now the third term of (2) is clearly positive and moreover, 


1 
(3) zy (c — m)p(x) > me —m), x (m — c)p(x) | < ; (c — m) 


and hence ¥(c) > 0 which proves the lemma. 
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Now by definition, the quantity 7.(a) in (11.2.7) is given by 


(11.2.8) z(a) = X Mo) bw) — a |E-(o) 
where 
pz | w)(w) 
$a) == ao 
Pape | oE) 
We now define 
7 NOR 
(11.2.9) 7¢(00 | 2) =n Be | 2) 
and 
(11.2.10) r*a) = | 4 — a |re(0 | 2) 
0 


It is clear that any a which minimizes r*.(a) in (11.2.10) will also 
minimize 7,(a) in (11.2.8). This in conjunction with Theorem 11.2.4 
and Lemma 11.2.1 gives a complete characterization of Bayes solutions 
for the loss function 11.2.6 and is incorporated in the following. 


Theorem 11.2.5. In any estimation problem where the loss is defined 


by (11.2.6), if the a priori distribution £ admits any estimate of finite 
expected risk then it admits a Bayes estimate defined thus: 


i*(z) =b if 7*:(b) <% only for beA 
= median of the distribution 7¢(8 |2) 


if +*,(a) < © for all ae A, where 7*, is defined in (11.2.10) and r¢(6 | 2) 
is defined in (11.2.9). 


consider again the problem of estimating a bi- 


rp he basis of N independent trials 


nomial probability w of a success on t 
When £ is rectangular over the interval (0, 1) and 


|o -al 


w(1 — v) 


Llo, a) = 


ifm = timate 
Let m be the number of successes. Then, if m = 0, cess ye — 
Of w is zero; if m = N, the Bayes estimate 1s unity han dara 
N, the Bayes estimate of w is the median of the distribution y 


T(N) o~a _ a a 


rw | m) = FON — m) 
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3. Bayes Estimates of Bounded Absolute and Relative Errors. Con- 
sider the following two loss functions: 


(1.2.11) L(w, a) =0 if |@@)—-al<k 


=1 if |@w)-al|>k 
where k > 0. 


(11.2.12) L(w,a)=0 if |@@)—a| < aa 


=1 if |0()—a|> aa 
where 0 <a <1. 

The loss function 11.2.11 is of particular interest in case z is a vec- 
tor in N space and 6(w) is a location (i.e., translation) parameter in a 
distribution. The loss function 11.2.12 is of interest in case 6(w) is a 
positive scale parameter in a distribution of a positive random variable. 
In this case a is also assumed to be positive. The latter loss function 
can also be written as 


(11.2.13) Llo, a) 


0 if (—a)<6@)/a<l+e 
=] otherwise 


In particular, if we take logarithms of the quantities involved, this loss 
function (employing the same symbols 6(w) and a) can be expressed as 


(11.2.14) L(w,a)=0 if —c<0)—-a<d 


= 1 otherwise 


where c and d are positive numbers. It can be seen that (11.2.11) is a 
special case of (11.2.14). 

To obtain a Bayes estimate for the loss functions under consideration 
and for an a priori distribution & we set 


re(Oo|z)= Z ëlo) 


{w: 0(w) = 00) 


where £,(w) is the a posteriori probability of w, given z. Now, for each 
ae A, we define an interval 


I, = {0:a—c<@0<a+d} 
We also define the function Rẹ by 
(11.2.15) ReIa| 2) = Z r02) 
bela 


It is clear then that the Bayes estimate for this problem is obtained by 
finding that ae A which maximizes Rll | 2) for each z. 
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Eent ie ir Ia which maximizes (11.2.15) is called a modal interval. 
i ‘ios sles Er g= d F. 0 the modal interval reduces to a point which 
Epa i : of the distribution 7:(8 |2. In case the a priori distribution 
e ectangular, the mode of the resulting conditional distribution 
ee = the maximum likelihood estimate ofo. If r(0 |2) isa density, 
n Te imate for the case ¢ = d = 0 is Bayes since the probability 
error with any estimate is zero. 
_let t*(z) be the value of a which maximizes (11.2.15), and let 6*() 
a + | 2). Then, for the location parameter case, B*(z) is the con- 
k Rt oeiy that the estimate of 0 does not differ by more than 
ffs s from the true value, and, for the case of a scale parameter, it 
i conditional probability that the estimate does not differ by more 
an a fraction a from the true value. The quantity 
B= DBP) 
dent of these estimates. The estimates 
ded absolute error in the case of trans- 


l i . . 

a parameters and estimates of bounded relative error in the case of 

r e parameters. Minimax estimates of the above type both for loca- 
on and scale parameters will be considered in Sections 11.3 and 11.4 


respectively. 


be called the confidence coeffic 
will be termed estimates of boun 


an Be Cast into the Framework of 


Payen Estimation. Let y, 1. °°» 7% be k +1 random variables having 
ea distribution g(y, tı, ***» 74) which is assumed to be known. A 
ee problem, which takes the form of a prediction, 1S usually for- 
the oe as follows: The statistician observes only Ti, =y Tk and on 
“ asis of this information desires to predict the value of y. There 
ay be many reasons for deciding to predict rather than observe Y- 
hee cases y can be observed only after a time lag. Thus, for ex- 
war e, if y represents the performance of a complex mechanism, its 
ue can be known only after the mechanism has been assembled. 
nd ee if y represents the possible average grades in college “on 
os icant, then y can be observed only after a lag of four p a 
a possible reason for not wishing to observe Y directly a t 3 i he 
ir ructiveness of the experiment which yields the value a y. í ae 
giv example, is the case where Y represents the tensile streng of a 
in en metallic object or the presence oF absence of an internal tumor 
& human body. 
a predictor of y is a function £ W 
or the into a decision space A. Since 
or-valued random variable, the space 


4. Prediction Problems Which C 


ch vector % = (ti, T2 
be either a numerical- 
n be fairly complex. 


hich maps ea 
y need not 
A ca 
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We shall here consider only the case where y is numerical valued so 
that A is the real line. We shall also assume that, if the true outcome 
is y and ae A is a prediction, then L(y, a) measures the loss due to the 
departure of y from a and that L(y, a) > 0 for all y and a. 

Since q(y, %1, +-+, z4) is assumed to be known, we can compute the 
conditional distribution of x = (x, ---, £4), given y, to be designated 
by p(x| y), as well as the marginal distribution of y to be designated 
by (y). We observe that the variable y is in the nature of a parameter 
in the distribution p(z | y) and that £(y) is in the nature of an a priori 
distribution of y. Thus the problem of obtaining an optimal predictor 
of y is identical with that of obtaining a Bayes estimate of y for the 
given a priori distribution £ and the given loss function L. More spe- 
cifically, the best predictor of y for a given x = (z1, ---, zz) is that real 
number ¢ such that 


È Ly, iry | 2) = inf ZL(y, a)r(y | x) 
Yy aea 


where r(y | £) is the conditional distribution of y for that x. (In case 
rly | x) is a density, then the summation sign in the above equation is 
replaced by an integral sign.) 

As an illustration, consider the case where the joint distribution of 
Y, Tı, ***, % are multivariate normal. Then r(y | x) is normal with 
constant variance for all x and a mean value which is a linear function 
of «1, +++, 7, known as the regression plane of y on zı, -++, ty. Since, 
for each x in this case, the density r(y | x) is symmetric around its mean, 
the value of the regression is the optimal predictor for any of the fol- 
lowing 3 loss functions: 


(1) L(y, a) = ky — a)? 

(2) L(y, a) = k|y — a| 

(3) L(y, a) = 0 if |y—al<k 
Si if |y-al>k 


where k is a positive constant. Also, since the variance of r(y | x) is 
independent of x, the conditional risk for each of the 3 loss functions is 
constant for all x and is therefore equal to the average risk. Thus for 


the loss function (1) the average risk is ko, for (2) it is NE ko, and for 
r 


(3) it is 1 — 8 where 
1 klo 
= eA g 
8 V 2r J j 


and where o° is the conditional variance of y, given z. 
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PROBLEMS 


11.2.1. If L(w, a) is for each w strictly convex in a and > +% asa — +», the 
Bayes estimate for any £ is unique and hence admissible. 

11.2.2. Determine the Bayes estimate ¢ of the probability w of a success in a bi- 
nomial distribution with a sample of size N when L(w, a) = (w — a)? and w has a 
rectangular a priori distribution over (0, 1). 

11.2.3. Show that the estimate 4 of Problem 11.2.2 has a smaller mean square 


error than the estimate = 1S x; (where z; = 1 if the outcome is a success and 
i=1 
zi = 0 otherwise) for all w such that | 4 — o | < -V2/4, and all N. 

11.2.4. Show that the Bayes estimate ¢ for the probability w of a success in a bi- 
nomial distribution with a sample of size N and L(w, a) = 0 if lo —a]| < ô, Llo, a) 
= 1 if |w — a| > ô, where ô is a given positive number, and w has a uniform dis- 
tribution, has the property that | (x) — # | < 6 and that the modal interval always 
includes the sample mean, where Z is defined in Problem 11.2.3. 

11.2.5. If p(t) is a continuous unimodal density, the modal interval (a, a + h) 
satisfies p(a) = p(a + h). 

11.2.6. (a) In estimating w, the probability of a success in a binomial distribu- 
tion, from a sample of size N with L@, a) = (w — a)”, the Bayes estimate for so) 
= Ko"(1 — w)", u > —1,» > —1 has the form až + 8 = tu,» where @ is defined in 
Problem 11.2.3. 

(b) Find w’, v’ such that p(w, two") is a constant p” independent of w, so that 
lw, = a'& + 8" is the minimax estimate of w. 

(c) In estimating the mean of an unknown distribution over (0, 1) from a sample 
of size N, show that p(w, tur) <p’, So that tu’ is minimax for this problem. 

11.2.7. Let x assume values 1, w, each with probability 1/2. Show that, for 
any loss function L(w, a) and any & the Bayes sequential-estimation procedure 
with constant cost per observation is (7) no observations with w estimated by that 
number a = a* which minimizes ByL(w, a), or else (ii) continue taking observations 
until a value of x different from 1 occurs, and estimate as this value of x. 

11.2.8. Let ty for any arbitrary N be the unique Bayes estimate of w for a loss 
function L(w, a), a given a priori distribution &, when Ti, +t, ZN are observed. 
Prove that U(x) = ty(zx) is admissible for any prescribed sequential plan $ when S 
terminates with M observations. , : 

11.2.9. In estimating w in a binomial distribution, let $ be the sampling plan of 
taking observations until m successes occur. Show that t = (m — 1)/(n — 1), 
where n is the number of observations required, is the unique Bayes solution when 
Elo) is rectangular and L(w, a) = (e — a)*/w*(1 — v), and conclude that t is admissi- 
ble when L(w, a) = (w — a)?. Prove that t is an unbiased estimate of w; i.e., Bu(t) 


= o. 
11.3. The Translation-Parameter Problem 


An interesting special case of the general estimation problem is that 


in which the distribution is known except for an additive constant w, 


and in which the loss depends only on our error in estimating v. Thus 
there is given a distribution q over an N-dimensional Cartesian space 
Sy, and with each real number w is associated the distribution p de- 


fi 
ned by ple |o) = 4 — we) 
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where e = (1, 1, ---, 1). A point ye Sy is chosen with probability 
q(y), an unknown constant w is added to each coordinate of y, and the 
result x is observed by the statistician, who is required to estimate w. 
A pure strategy for the statistician is a numerical function ¢ defined on 
Sy. For any choice of t, the risk function p is given by 


(11.3.1) p(w, Ò = D gy) L(y + u) — w) 
Y 
where L(s) is a non-negative function, specifying the statistician’s loss 
when his error of estimate is s, —% < s < %, 
We now reformulate the problem slightly. Lety = (yı, +++, YN) E SN, 
and let 


zlu) =y- y, i=2, N, ryan 


Then for each y, z = (z2, +++, zy) is a point in Sy_, and r is a point in 
Sı. The distribution p determines a distribution g of the vectors z in 
Sy-—1 and, for each z, a distribution h+ of the numbers r in Sı. Thus, 
for each y = (r, z2 +7, +-+, zy + r), we have 


gy) = g(z)h-(r) 


In terms of the new variables z and r the translation problem can be 
formulated as follows: A pair (z, r) is chosen according to the distribu- 
tion g(z)h.(r) on the space Sw—ı X Sı, and the pair (z, xı) with x; =” 
+ w is observed by the statistician who is required to estimate w. Thus 
an estimate in this formulation is a function t on Sy_, X Sı and the 
risk function is given by 


(11.3.2) p(w, ) = Dig@h(r) Le, r + w) — w) 


ar 


Notice that, if we are given z and x, we can either estimate w directly 
or estimate r and subtract it from 2, to obtain an estimate of w. Thus, 
if (2, x1) is an estimate of w, the function 
(11.3.3) u(z, z1) = x — tz, xy) 


is an estimate of r and conversely, if u(z, xı) is an estimate of r, t(z, *) 


= x, — u(z, xı) is an estimate of w. In terms of the function u, the 
risk p can be written as 


(11.3.4) p(w, u) = È Dd) g(e)ha(r)L[r — uz, r + w)] 


We shall thus consider the problem of estimating r, having observed 2 
and x; = r + w, where the risk of an error is given by (11.3.4). 


See. 1 Y 7 
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a w > eas an unknown real number, the observation tı 
principle pre dhe “sed value in estimating r, and the invariance 
ig he, Seer 7 in ( ‘hapter 8 leads us to expect that we can ignore 
Bia lk, eth a ention to the class I of invariant estimates u based 
Note a : out increasing the maximum risk. 
, if wis a function of z only, the estimate 


h ilz, £) = (tr ta, 0 ay) = tı — Ul) 
a x 
s the property that, for any real number s, 


n Izi +s, =, aN + s)=st Iln to zy) 

n esti s : P r 7 

. e satisfying the above is said to possess the translation prop- 
Fr 

pla Di a 1.3.4) we observe that, if w is 
» u) is independent of œ. Moreover, if we set 


an invariant estimate, the risk 


a. 
oe Fv) = DAL — v) 
We get = 
(11.3.6 
z p(w, 1) = Dg Fe(ul)) = ole) 

We now define z 
(11, 

am F(z) = inf F-@) 
So that a 
(11, 

ie ot = LIOF < 0) 
fr 5 
aa uel, and choosing u*(z) 8° that F(u*(2)) < F(z) + ¢ for all z 
(11, 
M = pyl te 
u 4 ToS, no we I has a risk less than P*; and we see how to choose 

as required. Thus, if we re- 


ene to approximate p* as closely 
Such esti selves to invariant estimates, 
v, which mates by evaluating the infimum © 
rom the of course, is a function of z. This result also per 
of th e discussion of Section 11.2 as can be seen from an exami z 
© risk function given in (11.3.6). Thus we have transformed the 


taise: 
i “lation-parameter problem into the framework of a Bayes estima- 
Ives to jnvariant estimates for the 


reformulated the problem so that 
meter, in the distribu- 


d the optimum of 


f (11.3.5) with respect to 
follows directly 


Strict 


on 
r 
q Ee by restricting ourse 
the rang r. More specifically, we have 
om variable r becomes a location para 
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tion of z2, -++, Zn, Which we are required to estimate. But its a priori 
distribution is known, since it is precisely the distribution of xı for 
w= 0. The above remark makes it possible to characterize the opti- 
mal estimates of this type for the various loss functions considered in 
the previous section. Thus, for example, if 


Ler — v) = k(r — v}, k>0 
then the optimal invariant estimate u* of the number r is given by 


u*(z) = Do rhi(r) = E(r|z) = E(2, | Tə — Ti, +++, ty — tı) 


Tr 
t 
and consequently the corresponding estimate of w is given by 
t*(z, t1) = 2, — uz) = 2 — Elz | t2 — t1, +++, ey — a1) 


Referring again to the invariance principle, we would suspect that 


the estimate u* not only is optimal among all invariant estimates, 
but that, for all estimates u, 


(11.3.10) polu) = sup p(w, u) > p* 
(Note that the definition of po(u) for all u in (11.3.10) is consistent 


with that in (11.3.5) for we I.) The main result of this section is that 


under certain mild restrictions on the loss function L, the inequality 
11.3.10 holds. 


Theorem 11.3.1. If for every e > 0 and every ze Z there is a set & 
such that 


(11.3.11) D h.lr)L(r — v) > Fhe) — e 
reR 
for all v, then (11.3.10) holds. 


We remark that the hypothesis of this theorem will be satisfied if 
either 


(1) L(s) is bounded for —% < s < « 
or 
(2) L(s) > +% as s— +o and Lis continuous 


Proof of (1). If L is bounded, the series 11.3.5 converges uniformly 
in v, and there is a finite set R such that 


È he(r)L(r — v) > Fv) — e 
reR 


for all v. Since F(z) < F.(v) for all v, it follows that (11.3.11) holds. 
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Proof of (2). We show that, if b,(v), be(v), -++ is an increasing se- 
quence of non-negative continuous functions with by(v) > +o as 
v — =e for all N, and, if b(v) = lim by(v) and b = inf b(v) is finite, 
then, for every e > 0, there is an NV with bN (v) > b — e for allv. The 
proof of (2) then follows since we can take 


N 
bye) = È h:(r)L(r: — v) 
i=1 
where ry, rə, «++ is an enumeration of the r’s for which h.(r) > 0. ; 
Since by hypothesis b;(v) —> © as v => =%, we can choose an in- 
terval (A, B), such that b\(v) > b + 1 for v outside (A, B). Let vy 
be the value of v that minimizes by(v) and, say, 


min by(v) = b*y = by(vy) 


Then A < vy < B for all N, and we may suppose, choosing a subse- 
quence of b*y if necessary, that vy —> v*. Also b*y < b for all N, 
say, b*y — b*, Then 


by (Vn) < ban) < d* 


for N < n, so that by(v*) < b* for all N, and b(v*) < b*. Thus b(v*) 
= b* = b, and b*y — b; i.e., there is an N for which 


min by(v) = b*y >b—e 


We proceed to the proof of Theorem 11.3.1. Let 7, 72, +++ be an 
enumeration of the possible r values, i.e., those values of r for which 
9(2)h:(r) > 0 for some z. With each sequence a = (m, no, +++) of in- 
tegers n; with n; = 0 for sufficiently large 7, we associate the number 


v(a) = 2 niri Then from (11.3.2), 
(1) plola), u] = E ge)h:(r) Llr: — ule, vl + ô) 


Where 6; has n; = l, n; =0forj#ži 
Let Amy consist of all sequences a with 
for i 2> N. Then 
a > plula), u] = Do g@) [= D hrir- (y| 
MN z 


ae An B ieJ(M, N, p) 


| ai] < M, for all i, æ; = 0 


Where J (M, N, 8) consists of all i for which 8 — ô; € Aary- 


312 ESTIMATION Ch. 11 


Since 8 — ô; £ Amv for Be Amı, y, 7 < N, we have 


(3) DX ee), > Voge) E Hye, ue) 
ae Ayn z BeAmuN 
where 
N 
(4) Hy(z,t) = Dhr)Lr; — į 
i=l 
For any e > 0, choose 2, ---, z so that 
k 
(5) È gle)F e) > p* — e 
1 


and choose N, as can be done by hypothesis, so that 
(6) Hy (zi, t) > F(z) — e 
for all ¿ = 1, ---, k and all ¢. Then from (3) and (6) we obtain 


n 
(7) 2 Pela), u] > > g(z)[F(z:) — a) a(M — 1, N) 


ae Ayn 
> (* — 2əa(M — 1, N) 
where a(M, N) = (2M + 1)” is the number of points in Amy. Now 
È plv(a), u] < a(M, N)o* (u) 


ae AMN 
which together with (7) yields 
a(M — 1, N) 
u) > ——— [* -2 
polu) aI, N) [p e] 


Letting M — œ yields po(u) > p* — 2e which, since e is arbitrary, 
completes the proof. 

. We remark that randomized estimates play no essential role in the 
translation-parameter problem. The explanation here is not convexity, 
as in Theorem 11.3.1, but the fact that the invariant estimates have 


constant risk. For, if t, t, +++ are pure estimates with constant risk 
Pı, P2, `++, the mixed estimate 7 using t, tə, ++ with probabilities ^1, 
de, +++ has constant risk p = Dd,pp_. Necessarily pp < p for some n, 


so that there is a pure estimate whose constant risk is less than or equal 
to that of 7; i.e., we have 


Theorem 11.3.2. In any translation-parameter estimation problem, 
the class of pure invariant estimates is a complete class, among the class 
of all invariant estimates. 
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An example in which (11.3.10) fails is the following. N = 1; 


qk) “KED K=1,2,---, f(s) = max (s, 0) 


An invariant estimate is a constant u, and 


K-wu 
fa ee for al 
plu) KET D +o forall wel 
so that p* = +o. The non-invariant estimate u(s) = M| s|, where 
M > 1, has risk 
K-M|K+o| 
p(w, u) = =—— aia a 
KK + 1) 


K>M|K+el 


For w > —(M —1)/M, K > M| K + w | is impossible, and p(w, u) 
=0. Foro < —(M — 1)/M,K > M|K + w| implies K > M|K + «| 


Jol, 


M M 
and K > —M(K inge = pi z 
M(K +u). Thuslettinge = 3p 4qleh@= 37 —4 


we have c < K <d. Consequently 
p(w, u) < H 
c 


Í d+1 dx 
Zoef = 
eaK+1 ve a 


M+1 Had [aa] 
oc tag | | le 
log [7 t Hel Sel" m-i 


From this it follows that 
M+ | 


< j= 
sup p(w, u) < toz | mı 


so that (11.3.10) fails. 

The most interesting statistic 
Scale parameters concern probabi 
crete. For this reason, the illustrative examp 
and the following section will deal with densities. Theorem 11.3.1 holds 
for densities and more general distributions in case the loss function is 
bounded or quadratic. The proof of this can be found in an appropri- 
ate reference at the end of the book. 
_ Let zi, to, +++, av be N independ 
Ing distribution f(z; — w) where —% < ti 

is a density. It is desired to obtain the minimax es 


al problems involving translation or 
lity distributions which are not dis- 
les to be given in this 


ent observations with each z; hav- 
< œ and —œ <w <% and 
timate of w possess- 
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ing the translation property when the loss function L is given by 
Llw, a) = klo — a)’, k>0 
Let t* be the desired estimate. We shall show that ¢* is given by 


Ed N 
{Í 0 [I f(r: — 0) do 


—« i=l 


(11.3.12) (in +++, ty) = 


o N 


II f(x: — 0) ao 


—a i=l 


whenever an invariant estimate of w of finite quadratic risk exists. To 


see this we set zj = 2; — t1, j = 2, ---, N and let 0 = zı — r. Then 
n N 
ff Thies + nar 
Par +++, ey) = x, — 2 


a N 
fJ lieta 
-n i=2 


= 2 — E(x, | zə — Ti +++, ty — 1%) 

since it is easily seen that 
N 
fr) ISe: r) 

(11.3.13) her) = ———> 

fo Tye: + w au 

—0 i=2 
Let again x1, £2, ---, &y be N independent observations with density 


S(v; — w) for each z;, and for any estimate a of w let the loss be given by 
Lw,a)=0 if |o—al<k 
= 1 otherwise 


It is desired to obtain a minimax estimate of w possessing the translation 
property. To accomplish this, we find the modal interval of length 2% 
in the distribution 11.3.13 and take the midpoint of this interval, say, 
u*(zo, +++, zy) as an estimate of r. Then the minimax estimate of w 
is given by 

i*(22, +++, 2N) = T1 — U* (22, ++, zn) 


Let 8 be the probability that ¢* will be within k units of w. Then 


o % u*+k 
(11.3.14) 6 =f -f (f 7 har) g(Z2, +++, zy) dz2, +++, AZN 
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where h.(r) is given by (11.3.13) and 


w N 
Iez aen) =| S Se + u) du 


—7n i=2 


Since the risk of using é* is constant, we see that ¢* is an estimate of 
w of bounded absolute error with confidence coefficient 8 which is inde- 
pendent of w. These results are quite general, and we thus have 


Theorem 11.3.3. If is a translation parameter and it is the only 
unknown parameter in the distribution, then there always exists a mini- 
max estimate of it which is of bounded absolute error with a confidence 
coefficient 8 which is independent of w. 


We remark that the value of the confidence coefficient 8 will depend 
on the sample size and can ordinarily be made to assume any desired 
value by adjusting the sample size. We also remark that when, N = 1, 
T = xı and h.(r) = f(x). This is of particular interest in case the class 
of distributions under consideration admits a sufficient statistic which 
can be treated as a single observation. To illustrate, if the loss function 
is quadratic and N = 1, the minimax estimate is given by 


(11.3.15) t*(@ı) = zı — E(x) = tı — constant 


PROBLEMS 


11.3.1. If x = (a, +++, £n) is a sample of size n from a translation-parameter dis- 
tribution and u(z) is any random variable with the translation property u(x + eh) 
= u(x) + h for all x, h, then u — Eq(u | T — ty tt, Tk — T1) = U1 — Eo(zı | z2 — 21, 
">t, &% — qı) where Eo denotes expectation with respect to the distribution of x 
where w = 0, 

11.3.2, Consider the following translation-parameter problem: g(1) = g(—1) = 
1/2, L(w, a) = min o-a], 1], N =1. (a) Show that there are two invariant 
Minimax estimates for w: a(z) = z + 1, t-z) =2—1. (b) Show that Uz) = z+1 
af T <0, U(x) = z — 1 if x > 0 is better than either ¢; or ¢_1, so that no invariant 
Stimate is admissible. 

11.3.3. Obtain the minimax invariant estimate of @ based on N independent ob- 
Servations from a rectangular distribution defined over the interval (0, @ + 1) for 

e loss functions 


G L(6, a) = (0 — a)? 

k Le, a) =|0—a] 

8 L@, a) =0 if |o—al<k 
sï if |o—a|>k 


Where k s 0. 
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11.3.4. In Problem 11.3.3 and loss function (3), consider the following sequential 
procedure: Take observations in sequence, and at the nth observation n = 1, 2, =- 
compute 


ut tk 
Bn = f l, a(r) dr 


Stop sampling as soon as, for some n, Bn > y where 0 < y < 1. 

(a) Characterize this sequential procedure. 

(b) Compute E(n). 

(c) Let N be the nearest integer to E(n), and compute 6 from equation 11.3.14 
for N observations. How does 8 compare to y? 


11.4. The Scale-Parameter Problem 


In some statistical problems we may be dealing with a positive ran- 
dom variable whose distribution is known except for a multiplicative 
positive constant 6, and the loss in estimating @ depends only on the 


relative error. That is, for any estimate b > 0, the loss function L is 
given by 


(11.4.1) L(6, b) = L(6/b) 


As in the translation-parameter case, we may assume that we are 
given a distribution q over an N-dimensional Cartesian space Sy, and, 


with each real positive number 9, there is associated a distribution pe 
defined by 


(11.4.2) pe |0) = p(v/d) = pr1/d, «++, vv /8) 


A point we Sy is chosen with probability q(w), each coordinate of w 
is multiplied by a positive constant 6, and the resulting vector v is ob- 
served by the statistician, who is required to estimate 0. 

The above scale-parameter problem is easily reducible to the trans- 


lation-parameter problem, as can be seen from the following considera- 
tions: We set 


zi = logy, y:=logw,, w = log6, a= logb 
Then 


(11.4.3) L(6/b) = L(e*~*) = L*(w — a) 
pv | a) = ple, ++, NTH) = Po(x — ew) 


We see then that all the results derived in Section 11.3, when properly 
translated, are also applicable to the problem under consideration. In 
particular, if L in (11.4.3) satisfies the conditions of Theorem 11.3.1, 
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then we can state that a minimax estimate of 6 exists and is obtained 
as follows: Let 


r= W, Zj = w;/w, j=2,-,N 


and let h.(r) be the conditional distribution of r given z = (z2, ++, zy) 
and given that 0 = 1. Also let 


F.00) = Lh C) 


For the sake of simplicity, we shall assume that there exists a b* 
such that 
F (b*) = inf F(b) 
b 


Then, for any 22, +++, ZN, 

b* (z2, +++, zN) = b*(vo/vi, «++, vN/0) 
is a Bayes estimate of r = w; and 
(11.4.4) t* (v, +++, Uy) = v1/b* 


is a minimax estimate of 6. Observe that the estimate ¢* satisfies the 
condition 


(11.4.5) i* (cvi, «++, cv) = cl* (vi, +++, vy) 


where c is any constant. Estimates of this type are said to possess the 
multiplicative property. , 
Of particular interest is the loss function 


(11.4.6)  L(0/b) 


1 if |o- b| < ab, or equivalently 
if l—a<6/b<lt+e 


=0 otherwise 


and the random variables v; 0 < v; < © have a distribution density 
SQi/ 0). Here the minimax estimate is obtained as follows: As above, 
let h.(r) be the conditional distribution of 7, given z, and given that 
®=1. Then 


N 


fr) Iez) 
m her) = -7 

f) [I fwzi) du 
Aiso Ter —» int 
(11.4.8) aw) - [no m 


318 ESTIMATION Ch. 11 


Then the Bayes estimate of r is that b = b* which maximizes Q(b), 
and the minimax estimate of @ is given by 


(11.4.9) t*(vy, +++, uv) = v1/b* 


Let 8 represent the confidence coefficient of the estimate t*, i.e., the 
probability that the relative error in the estimate will not exceed a. 
Then 


2 e] a N 
(11.4.10) £ -f -f Q(b*) Lf fu) JI fuz) du | dzə, +++, dzy 
—0 —o —v t=2 


Since the risk of using ¢* in (11.4.9) is constant, we can state the fol- 
lowing theorem. 


Theorem 11.4.1. If 9 is a scale parameter and it is the only unknown 
parameter in the distribution of the type considered, then there always 
exists a minimax estimate of it which is of bounded relative error with 
a confidence coefficient 8 which is independent of 0. 


As an illustration, consider the following problem. A chemist desires 
to estimate the average concentration of inert particles per unit volume 
in a liquid. He desires his estimate to be no more than a specified frac- 
tion a away from the true concentration and wishes to be right with a 
specified probability 8. He assumes that the particles are well mixed 
in the liquid so that the distribution of the number of particles in any 
volume of the liquid can be considered to be Poisson. He spreads & 
known amount of this liquid on a plate which he will examine with a 
microscope. Let A be the expected number of particles in a unit area 
of the plate. The problem then becomes one of estimating À. 

In any specified area A on the plate, the distribution of the number 
of particles m is given by 

=A m 
(11.4.11) nln = 
m! 

We observe that the quantity À is in the nature of a scale parameter, 
but it is not a scale parameter in the distribution (11.4.11). This seems 
to indicate that there exists another random variable for which Aà is & 
scale parameter. We shall now show that, if the chemist were to focus 
the microscope on the plate and then continuously increase the aper- 
ture and stop as soon as the observed area contains exactly m particles, 
then the size of this area is the required random variable. To see this, 
let z designate the size of such an area. Then the event z > g is equiv- 
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alent to the event “The area of size x has fewer than m particles.” 
Thus 
mal gz (yz)i 
P@>2x)= L —— 
jz J! 
Let F(z) be the c.d.f. of the random variable z, and let g(z | A) be the 
density of z. Then 


m— o—\z(\y)) 


Fae) =1- È 7 
j=0 J: 
and 
dF y(x m1 (ye (az)? — Ne T^ (Ne) t) 
a(x) = reo + 5! (Az) 
dx j=l gt 


=Az().\m—1 
ed el 
(m — 1)! 
Thus À is a scale parameter in the distribution of v, and, if \ = 1, æ 
has a chi-square distribution with 2m degrees of freedom. 
Now suppose the chemist observes n non-overlapping areas of size t1, 
‘+, n, each having the property of being the minimum area that in- 


cludes m particles. Then since 2\ >) 2; has a x” distribution with 2mn 


i=l 


degrees of freedom, the distribution of w = D 2; is 
ist 
ne (Aw) 2 


(11.4.12) S|) = Fa 


Thus we can consider our problem as that of estimating à on the basis 
of a single observation w whose distribution is (11.4.12). The mini- 
max estimate of à is here given by 


t*(w) = b*/w 
where b* is the value of b which maximizes 


b( +a) 
f euy"! dy 
b 


(1—a) 


(11.4.13 Donme 
) Q) (mn — 1)! 

Note that b* depends on mn only. Thus the value of mn can be so 
Chosen as to make the confidence coefficient 6 = Q(b*) as large as 
desired, 
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PROBLEMS 


11.4.1. Show that, for small œ and large 8, N = mn = (1/a")z%(14,)/2 where 
21+)/2 is the normal deviate which is exceeded with probability (1 + 8)/2. Hint. 
Employ the fact that if x? is based on a large number of degrees of freedom s, then 


z = (V2 — V23 — 1) 


is approximately normally distributed with zero mean and unit standard deviation. 

11.4.2. Find the Bayes estimate of bounded relative error for A in the particle- 
counting problem when A has a priori distribution e, à > 0. 

11.4.3. Obtain the minimax invariant estimate of @ based on N independent 
observations from a rectangular distribution defined over the interval (0, 0) for the 
loss function 

L@,b)=0 if |0 —b]|< ab 


=1 if |0—-b|>ab O<a<1 


11.4.4. In Problem 11.4.3, characterize the following sequential procedure: Take 


observations in sequence, and at the nth observation n = 1, 2, +--+ compute 
bl +a) 
Bn = hi(r) dr 
b*(l—a) 


where h,(r) is given by (11.4.7). Stop sampling as soon as, for some n, Bn Z V, 
where 0 < y <1. 


11.5. Admissible Minimax Estimates for the Exponential Class 


Let zı, +++, zy be N independent observations with each a, having 
the distribution 
(11.5.1) p(x; | ©) = Blora) 


where the xs are real numbers and w ranges over an interval on the 
real line. The class of distributions 11.5.1 called the exponential class 
was studied in Section 6.7, and it was proved there that the distribu- 


tion of the random variable (£) = (a, +- - -+ xy)/N is also of the ex- 
ponential type. Let 


va) = = Det 


Then by direct differentiation [see Problem 7.4.1] it follows that 


(11.5.2) o(a) = Bafa) = EO 
2. 
(11.5.3) (w) = By(% — B,(2))? = d'y(w) _ dlw) 


dw? dw 
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The problem we shall consider is that of estimating 6(w) when the 
loss function is 

k(6(w) — a)? 

(11.5.4) ity, @) = e 


, o’ (w) 


k>0 


In Chapter 8 we have seen that the function ¢ is a sufficient statistic 
for this class of distributions. Moreover, by (11.5.2), Zu(é) = 0(w), and 
the variance of t is o*./N. Thus, if ¢ were used to estimate 6(w) with 
the loss function 11.5.4, the risk p would be given by 


(11.5.5) p(w, i) = k/N 


for all w. This suggests that ¢ might be a minimax estimate of 6. We 
shall show that, if the range of w is the entire real line, ¢ is not only a 
minimax estimate but also admissible. More correctly, we shall show 
that under this condition ¢ is admissible, and from (11.5.5) it will then 
follow that ¢ is minimax. 
Theorem 11.5.1. If the range of w is the entire real line, then t(x) = 
J 2 . 
N > 2; is an admissible, and hence minimax estimate of 6(w) for the 
i=1 
loss function 11.5.4. 
Proof. Before we prove this theorem, we need the following lemma. 
Lemma 11.5.1. If A, > 0, £, are sequences of numbers, the set of + 
values for which 


elt) = Dy dne™ 
1 


is finite is an interval, i.e., if g(a) and ¢(b) are finite, so is olt) for a<t 
<b. Moreover, », the kth derivative of p, is finite and is given by 


D Ane ne, 


nai 
Proof. Suppose g(a) and ¢(b) are finite. We first show that, for 
any k > 0, the series > hart,e** is absolutely and uniformly convergent 


: 1 ‘ 
m any interval a, < £ < bı witha <a < bı < b. Consider the func- 


tion f defined by Jaji 
gle 


f(z) = e7 


Tt is clear that f is continuous for all x and f(x) —> 0 as x > +0. 
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Since f(0) = 0, it follows that f is bounded above for z > 0. Say 
|z he < Me", z>0 
Similarly, we have 
| x |t < Moe, z<0 
Thus, for a; <7 < by, 
Anl tn [te < Marne + Morne®™ = u(t) 


Since by hypothesis the series Du, (x) converges, the series ZA,t*ne™ 
converges absolutely and uniformly in 7 in the interval a, < 7 < bı. 
Using the theorem that a series can be differentiated term by term over 
any interval in which the resulting series is uniformly convergent com- 
pletes the proof of the lemma. 

We now return to the proof of the theorem. Assume that t is not 
admissible. Then there exists an estimate u better than t. In view of 


Theorem 11.1.2, we might as well assume that u is a function of t. 
Let 


(1) ¥(w) = Eslu) = Bo(w) > u(t)eR(d) 
and set u(w) = 1/Bo(w). Then 

/ í oa #'(w) tw 
(2) Vlo) = Bole) (E ult) l: h RO) 


= Bo(w) D2 u(t)[u — 0(w)JeR(1) 
t 


= E.[u(t — 6)] 


when the differentiation of the series is justified by Lemma 11.5.1. 
Now, by Schwarz’ inequality applied to equation 2, 


(3) Wo? = {Esul — 01}? < oF (w)o,2() 
Now let 
(4) ¥(w) = blw) + Ow), Ve) = b'(w) + oF) 


We also have 

(5) Tu (w) = Bal(u — 0(w))?] — b2(w) 
Substituting (4) and (5) into (3), we get 

(6) ou" (w)b(w) + Do) + ow)? < oP (w)Ba(u — 0)? 
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Now by hypothesis 


(7) E.[(u — 8)?] < oP?) 
Therefore 
(8) a: (w)b? (w) + blo) + o?@)P < oilo) 


It follows from (8) that b'(w) is never positive, so that b(w) is non- 
decreasing in w. Now dropping the term [b'(w)]? from the left-hand side 
of (8) and simplifying, we get i 
daf 1 
2 , || ee Ei: T 

(9) b?(w) + 2b’(w) <0 or = [zl =e 
where b(w) = 0. ; 

Since b'(w) < 0, it is true that either b(w) = 0 for all sufficiently 
large w, or, above some value wo, b(w) = 0. In either event 1/b(w) > 
20 — [wy — 1/b(co)]. Therefore, in either event, b(w) > Oasw > œ, 
and, by the same argument b(w) — O asw — —%. Now, if b’@) < 0 
and b(w) = 0 as w > +, b(w) = 0 for all w. This in turn implies 
that 


(10) êlo) < Balu — 0)? 


as can be seen from equation 6. Consequently, no better estimate can 
exist. 


PROBLEMS 


11.6.1. Show that, in the distribution 11.4.12 with 1/A = 0, the estimate t = w/mn 
of 8 is not admissible for the loss function (t — 0)? and hence that Theorem 11.5.1 
no longer holds if w does not range over ee a line. Hint. Show that if 

= w/(mn + 1 E(t* — 0)? < E(t — 6)° for all 0. ; . 

11.5.2. a D Ts 1 dette for the case a < 2; < b with a and b finite 
by showing that nature’s least favorable distribution for @ is rectangular over the 
'nterval (a, b) and that ¢ is the unique minimax strategy for the statistician. Also 
Show that the estimate £ satisfies the conditions of Theorem 9.2.1 and consequently 


18 of fixed sample size. 


CHAPTER 12 


Comparison of Experiments 


12.1. Introduction 


A fixed sample-size game with finite Q is an S game specified by a 
sample space Z = (Z, 9, p), 2 = (wn ---, wn), a closed bounded convex 
subset W of n space, and a loss function described as follows: When 
the state of nature is w,, the statistician observes z selected according to 
w;, after which he chooses w = (wi, +++, Wa) e W, incurring loss wi 
(See Section 6.1. Henceforth we shall identify the set W with the ac- 
tion space A.) If, instead of observing z, the statistician observes 
merely the value of a random variable f defined on Z, he will generally 
have less information about w than had z itself been observed. How- 
ever, if f is sufficient on Z, we saw in Chapter 8 that f is as informative 
as Z (more precisely, the partition of Z determined by f is as informa- 
tive as Z) in the sense that any risk function attainable with Z is also 
attainable with f. 

Any random variable defined on Z will be called an experiment asso- 
ciated with the sample space £, and in this chapter we shall be con- 
cerned with the problem of comparing two experiments f and g defined 
on the same sample space: If the statistician may observe values of 
either f or g but not both, and if f and g are equally costly, which should 
he observe? Generally, the answer to this question depends on the 
set A of actions associated with the problem, and even when A is speci- 
fied some Bayes solutions may involve observing values of f while others 
may involve observing values of g. We shall study those pairs of ex- 
periments f, g for which f is preferable to g in the strong sense that for 
any A the class of decision functions specifying that values of f be ob- 
served is a complete class, 

As an example of the kind of problem with which we are concerned, 
consider a large population in which each individual has or has not each 
of two characteristics H, S and in which the proportions h, s of individ- 
uals with characteristics H, S are known. What is not known is the 
proportion w of individuals having both characteristics; however, we 
suppose it known that either w = hs, so that the characteristics are in- 

324 
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dependent, or w = ô = hs, a specified alternative proportion. One ex- 
periment which the statistician might perform is the observation of 
1000 individuals with characteristic H, another is the observation of 
1000 individuals with characteristic S, and a third is the observation 
of 1000 individuals from the general population. Denote by xy, ŽS 
Xen, es, X, observations of 1000 individuals from H, S, CH, CS, and 
the general population respectively, where CH, CS denote the charac- 
teristics non-H, non-S. It turns out that there is one of these experi- 
ments which is preferable to all the others, no matter what set A is as- 
sociated with the problem. This best experiment is obtained simply 
by comparing the proportions h, s, 1 — h, 1 — s of individuals with 
characteristics H, S, CH, CS and selecting 1000 individuals with that 
characteristic which is rarest in the general population. 

This example may be formally described as follows, restricting our- 
selves to observations Xu, Xs, teH, Tes. The set Z of the sample space 
X= (Z, 9, p) is the set of all sequences of 1’s and 0’s 


Ui, +t, Uiooo; Vi, +*+, M1000; Wi, ***, Wiooo; Yis ***» Y1000 


where the u; correspond to a sequence of 1000 individuals with charac- 
teristic H and u; = 1 or 0 as individual 7 does or does not possess char- 
acteristic S; the v; correspond to a sequence of 1000 individuals with 
characteristic S and v; = 1 or 0 as the ith individual of this sequence 

es or does not possess characteristic H; the w; correspond to a se- 
quence of 1000 individuals with characteristic CH and w; = 1 or 0 as 
the ith individual of this sequence does or does not possess characteris- 
tic S; the Yi correspond to a sequence of 1000 individuals with charac- 
teristic CS and yi = 1 or 0 as the 7th individual of this sequence does 
or does not possess characteristic H. There are thus 2*°°° points in Z. 
2 consists of two elements wı and wz, which may be described as follows: 


1: py(u;=1) =S, pal = 1) =h 
Pu(Wi = 1) = 8, Palyi = 1) =h 
02: Doli = 1) = ô/ħ, Pali = 1) = 8/8 
—ô h— ô 


s 
Y= , yi = 1) = 
Den (W; = 1) T Peoli ) —_ 


The random variable fy, defined on Z, takes as values the sequences 
lu}, i= 1, +++, 1000; correspondingly the random variables fs, fox, 
Jos, take as values the sequences {v;}, {wz}, {yi} respectively. 

For simplicity, we shall restrict attention to experiments with a finite 
number of possible outcomes. If the random variable f which consti- 
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tutes the experiment assumes the N distinct values z1, ---, cy, and if 
Q = (w1, «++, œn), then the experiment f is described by the n X N 
matrix P = || py ||, where 


Pij = Polle: f(z) = 2;}) 
We have pi; > 0 and >) pi; = 1 for each 7, so that P is what is called a 
J 


Markov matriz. Thus every experiment in which both Q and the num- 
ber of possible outcomes are finite corresponds to a Markov matrix, 
and every Markov matrix can be interpreted as an experiment. The 
experiment fy of the preceding paragraph, for instance, is represented 
by the 2 X 1001 Markov matrix with 


1000\ . : 
Puy = ( \sa — 510003 
J 


‘ex aV g\ 1000-5 
; = - 1-- 
m= (SVG) a) 


We remark that an experiment can be formally characterized in sev- 
eral ways other than as a random variable or a Markov matrix. It 
may also be regarded as a partition of the sample space—we have in 
previous chapters seen what a close relation there is between random 
variables and partitions; from this standpoint the comparison of ex- 
periments is a comparison of partitions of the sample space. 


12.2. Some Equivalent Conditions 


A fixed sample-size decision problem with finite 2 = (w1, ***, @n) is 
specified by a pair (P, A) where P is an experiment, i.e., an n X 
Markov matrix, A is a closed bounded convex subset of n space: a ter- 
minal action is a point a = (a, ---, an) in A, and the loss from action @ 
when the state of nature is w; is a; A er ie function is specified by 
an N X n matrix d = || dj: ||, whose jth row is a point in A, specifying 
the action to be taken when sample point j is observed, and every 
N X n matrix D, each row of which is a point in A, is a possible deci- 


sion function. When the state of nature is w;, the risk is 2 Pijdji, th 


ith diagonal element of the matrix PD, so that the worth o 'Di is speci- 
fied by the risk vector b(P, D) whose n coordinates are the diagonal ele- 
ments of PD. The set of possible risk vectors in the problem (P, A) 


is the set B(P, A) of all points b(P, D) as D varies over all possible de- 
cision functions for the problem. 
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These notions are illustrated in the following simple example. 


n=N=2. Let 


Tı T2 
1 2 
oe F 
P= 
3 2 
O15 5 
w, Wo 
a | y 
D= 
toju v 
Then 
1 2 
gu + gu 
PD = 
3 2 
5c + 5U 


and consequently 


SOME EQUIVALENT CONDITIONS 


A={(@@,y):P?+y <1} 


+y al 


a E 


3y + *) 
By + 3o 


BP, A) = {(a, 6):a=4x+3u, B= 8y th, 
Hy Ll, PHI 


Since we may take x = u and y = 2, it is easily seen that B(P, A) D A. 
he relation between the two sets is shown in Figure 21. 
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Theorem 12.2.1. In any decision problem (P, A), the set B(P, A) 
is a closed bounded convex set containing A. 


Proof. Closure, boundedness, and convexity of B follow immediately 
from Theorem 6.2.1. For any ae A, choosing D with all rows equal to a 
yields b(P, D) = a, so that B(P, D) > A. 


Definition 12.2.1. Let P, Q be any experiments with the samen. We 
say that P is more informative than Q, written P D Q, if for every closed 
bounded convex set A of n space 


B(P, A) D BQ, A) 


Thus P > Q means that, for any specification of the set of terminal 
actions and their losses, any loss vector attainable with Q is also at- 
tainable with P. For a given n, the least informative experiment is 
one with identical rows, so that the distribution of x is independent of 
w; the most informative experiment is the n X n identity matrix, where 
observing x identifies w; and other experiments are intermediate. Gen- 
erally speaking, two experiments will not be comparable; i.e., neither 
will be more informative than the other. 


Theorem 12.2.2. Let P, Q be n X Ny, n X No Markov matrices, 
i.e., any two experiments with the same n. Each of the following con- 
ditions is equivalent to P D Q. 


(1) For every Nz X n matrix C there is an Ny X N Markov matrix 


M such that every diagonal element of PMC equals the corresponding 
diagonal element of QC. 


(2) There is an Ny X No Markov matrix M with PM = Q. 


(3) For every Nz X n matrix C there is an Ny X Ng Markov matrix 
M with 


Trace of PMC < trace of QC 


(The trace of a matrix is the sum of its diagonal elements.) 
(4) Writing 


aj = Li Piss B; = Do a 
i i 


e EnA) 
a S B; 


for any continuous convex function ¢ defined for all y = (y1, ***, Yn) 
n 


with y; > 0, >> y; = 1 we have 
i 


Nı Ne 
xu ajo(ej) > Xu Bieli) 
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(5) With aj, 8;, ez, fj as in (4), there is an Np X N, Markov matrix T 
with 


Ni 

(a) X ter =f; for j=1,-+-,Ne 
k=l 
No 

b) DX Biljik = ak for K=1,++-, N; 
j=1 


Proof. P >Q = (1). (The symbol = stands for “implies.”) Let 
C be any N X n matrix, and let A be the convex hull determined by 
the rows of C. Then C is a possible decision function in (Q, A). Any 
decision function in (P, A) is an Ny X n matrix D, each row of which 
is a convex linear combination of the rows of C, and so there is an 
Ny X Na Markov matrix M with D = MC. Choosing D with b(P, D) 
= b(Q, A) yields an M satisfying (1). 

(1) = PDQ. Let A be any closed bounded convex set in n space, 
and let C be any decision function in (Q, A); i.e., C isan No X n matrix, 
each row of which is a point of A. From (1), there is an N, X No 
Markov matrix Mf with b(P, MC) = b(Q, C). Since A is convex, MC = 
D isa decision function in (P, A), so that b(Q, C) e B(P, A) and B(Q, A) 
C B(P, A). 

The implications (2) = (1) = (3) are clear. (3) = (2): Consider 
the game in which I chooses any Ne X n matrix C with 0 < cj; < 1 for 
all j, 4; II chooses any Nı X Nə Markov matrix M, with payoff I(C, M) 
= trace of [(PM_— Q)C]. Since the sets of strategies are closed bounded 
Convex subsets of Non and NıNo space respectively and the payoff II 
1S bilinear, by corollary 2 of Theorem 2.5.1 the game has a value vo, 
and the players have good strategies Co, Mo; i.e, (Co, M) > vo > 
T(C, Mo) for all C, M. From (3) there is an M, with 1(Co, M) < 0, 
80 that v <0. Thus I1(C, Mo) < 0 for all C; i.e., the trace of UC is 
X0 for all C with 0 < cji < 1, where U = PMo — Q. Then uy < 0 
or all i, j, for the choice Ciao = 1, Cji = 0 otherwise, makes the trace 
of UC equal Wiggs Since PM and Q are both n X Nə Markov matrices, 
: “iz = 0 for every i, so that u;; = 0 for all i, j, and PM = Q. 


4) = (3): Let C be any Na X n matrix, and let (y) = — min 
J 


n 
È Si. For any Na X Na Markov matrix V, 


n, No, No Nz Me ii F 
Trace of QVC = quitjntes = D b; D vi (> g (3) 
j=l k=l i=l j 


i=1,j=1,k=1 


No No 


No 
> — BD vred) = — x Belfi) 
j=l 


k=1 
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with equality if V is chosen so that v;, = 1 for the smallest k with 
ee () = —¢(f)) 
i=1 


We have shown that 


min [trace of QYC] = — = Biol) 
v j= 


A similar argument shows that 


Ni 
min [trace of PMC] = — >> ajo(e;) 
M j=l 


where M varies over all Ny X N Markov matrices. Consequently (4) 

implies the existence of an Ny X Nə Markov matrix M with trace of 

PMC < trace of QVC for all V. In particular, choosing for V the 

N2 X Nə identity matrix yields (3). 
(2) = (5): Define 


QkMkj AkMkj 
bk = —— = 
Nı 


B; 
Do armij 
k=1 


Cleary T is an No X Ny Markov matrix. 


> lrer is 


k=1 
Nı akM; Pik l] Mı my 
kai \ Bj ak Bj k=1 B 


= ith coordinate of fj, proving (a) 


The ith coordinate of 


N2 N2 
Also >> Bitr = >> Mj = a, proving (b) 
j=l j=1 


(5) = (4): We have 


Ni 
x argpler) = 3 Epi Bitjro(er) 


k=1 j=1 


Na 
> = Bip (> tner) = D Bil) 
j= j=l 

We remark that (2) asserts that, if we observe the result of experi- 
ment P and, when the result is x = j, select y = k with probability 


Mir, k= 1, , No, the resulting y is equivalent to experiment Q; 
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thus (2) asserts that Q can be duplicated from P with the aid of a table 
of random numbers. Equivalence of (2) with P D Q implies that P 
is more informative than Q if and only if Q can be duplicated from P. 

Condition (3) also has a simple interpretation, for the trace of QC is 
n times the expected loss from decision function C and experiment Q 
where all alternatives w; are equiprobable and the trace of PMC has a 
similar interpretation. Thus P D Q if and only if for any A the Bayes 
risk when all alternatives are equiprobable in (P, A) is less than or 
equal to the corresponding Bayes risk in (Q, A). 

A sixth equivalent condition is that for every A, in the decision prob- 
lem in which the statistician may without cost observe either P or Q 
but not both, followed by the choice of a e A, the class of decision func- 
tions that involves observing P is a complete class; i.e., 

(6) For every A, the set T of admissible points of S, the convex set 
determined by the union of B(P, A) and B(Q, A) is a subset of B(P, A). 

For P D Q implies S = B(P, A), so that (6) follows, and (6) implies 
(3), in view of the interpretation of (3) given above. 


12.3. Combinations of Experiments 


If P and Q are n X N, and n X Ns experiments, both of which may 
be performed, the combined experiment (P, Q) corresponds to an 
n X NiNa matrix R with rij. = Prob {x = j, y = k | w = wi} where z, 
Y denote the result of P and Q. If P and Q are independent, i.e., if x 
and y are independent under each w;, we have Tijk = PisGik- 

Consider any m X n matrix A, and any r X s matrix B. We define 
the rm X sn “partitioned” matrix (a;;B) to be 


aB +++ aynB 


ämıB ©! amn B 


where the ijth element a;;B is itself an r X s matrix. (We remark that 
© matrix multiplication rule of rows by columns holds for partitioned 
Matrices.) 

If P and Q are independent experiments, the matrix associated with 
the combined experiment (P, Q) may be written as a partitioned matrix 
(P) where P; is a matrix consisting of the 7th row of P. This repre- 
Sentation of a combined experiment will facilitate the proofs of the next 
two theorems. 
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Theorem 12.3.1. If P, S aren X Ni, n X No experiments and R is 
an n X N; experiment which is independent of both, then P D S im- 


piee (P, R) > (S, R) 

Proof. If PDS, there is an Ny X Ne Markov matrix M with 
PM =S. Let M* = (6;;M), 6; = 1if 1 <i = 7 < Ng, ôy = 0 other- 
wise. M* isa NNa X N2N3 Markov matrix. Then 

(P, R)M* = (rP) (èM) = (ry7PM) = (ry8S:) = (S, R) 
Therefore, by condition 2, Theorem 12.2.2, 
(P, R) D (S, R) 

Theorem 12.3.2. If PD S® and P® > S2, PP and P® are 

independent and S and S® are independent, then 
(P®, P®) > (59, s9) 
Proof. If P® > S® and P® > 8S, then there are Markov mat- 


rices M® and M® such that POM = SM and POM? = 8P. 
Let M = (mM), then M is a Markov matrix and 


(P®, P®)M a (pa P P:P) (mP MY) = (POM E pin? my?) 
= (538) = (S0, s9) ' 


where m;;, p:;, and s;; are the general elements of M®, P®, 
and S“, respectively, for k = 1, 2. Therefore, by condition (2), Theo- 
rem 12.2.2, (P®, P®) > (SM, s2), 

The extension of Theorem 12.3.2 to N independent experiments is 
clear. In particular, we obtain that, if P D> S, N independent observa- 


tions from P are more informative than N independent observations 
from S. 


12.4. Dichotomies 


For two given experiments P, Q, neither the definition of D nor any 
of the six equivalent conditions of Theorem 12.2.2 yields a systematic 
method for deciding whether PDQ. For the case of dichotomies, 


i.e., n = 2, there is a seventh equivalent condition which is useful in 
deciding whether P D Q. It is given by 


Theorem 12.4.1. For any experiment P with n = 2, write Fp(t) = 
2 a, Crd) =Í Fp(u) du. Then Cp(i) is an increasing convex func- 


Pij Sajt 
tion of ¢ on 0 < ¢ < 1 with Cp(1) = 1. P >Q if and only if Cp(t) = 
Colt) for all t,O <t#< 1. 
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Proof. Define 
ft uw = (t-u) for OS usi<l 
=0 for t<u<l 


For each £, f(t, u) is convex in uso that g(t, y) = f(t, yı) is a convex func- 
tion of y = (yı, y2) defined over y: > 0, yı + y2 = 1. We have 


uh Pij 
YS ajli e) = Le oj (: = 2s) 
1 prj Sajt Qj 
pe) — È mi 


Pij Sajt 


Also, integrating by parts 


t t p F, 
Cp(t) = uFplu)| — it udFp(u) = Fp) — È ay (=) 
0 0 Pij Sajt Qj 


Nı 
= LD ajy(t, ej) 
1 
Thus Cp(é) > Colt) for all ¢ if and only if 
(i) Dajv(e;) > TBie(fi) 


for all convex ¢(y) = f(t, y) for some t. Moreover, for linear gly) = 
Ay +B 


Laje(e;) = A È prj + BEaj = A + 2B = 28;0(I;) 
J 


Then Cp(é) > Colt) for all ¢ is equivalent to (i) for all functions (y) = 
Zhif (le, y) + Uy) where hy > 0 and l is linear. Since every continu- 
ous convex function can be uniformly approximated by ¢ of this form, 
Cp(t) > Colt) for all ¢ is equivalent to condition (4) of Theorem 12.2.2. 
Finally Cp(1) = Say — py = 1- 


J 
For dikotomi, P D Q has a simple description in terms of errors of 
the first and second kind, as follows: If we consider the problem of 
testing w; against w2, a test, i.e., decision function, is a function 9, 
0 < g(x) < 1 for all x, specifying the probability with which we ac- 
cept we when x is observed. For any ¢, write 


ap(y) = pije; 

Belg) = Zpa;(1 — ¢%) 
The set Sp of possible pairs ap(y), Bp(y) is clearly a closed bounded 
Convex subset of the plane; let Bp(a) = ree Bele), 0<a< 1l. Then 


aply) =a 
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Bp(a) is a convex decreasing function of œ associated with the dichotomy 
P, specifying the minimum attainable £ for a given a. 


Theorem 12.4.2. P D Q if and only if @p(a) < Bola) for all a. 
Proof. Clearly Bp(e) < Bq(@) for all æ if and only if fp(k) = min ap(¢) 
p 
+ kBp(¢) < folk) = min ag(¢) + kBo(¢) for all k > 0. Now ap(v) + 


e 
kBple) = k + Ze;(pı; — kpz;) is minimized for j= 1 when pı; < 
kpzj, p; = 0 when pı; > kpo;, so that, in the notation of (4) of Theorem 
12.2.2, 


f, k 2j 
fel) = a; (=) + > ai( ai = Zajpr(e;) 


kpj 2Pij Qj kpzj <pij 


where 
Ve(Y1, Yo) = min (yı, kyə) 


Similarly fo(k) = Z8;4:(J;), so that Bp(a) < Bola) for all æ if and only if 
G) Zajprle;) < EBel Sj) forall k> o0 


As in the proof of Theorem 12.4.1, every continuous concave y can be 
uniformly approximated by functions Dh; + l, where hj > 0 and l is 
linear, so that (i) is equivalent to 


Zab(e;) < EBA) 


for all concave y, which is equivalent to (4) of Theorem 12.2.2. 


12.5. Binomial Dichotomies 
For a sample of size 1, a binomial dichotomy has a matrix of the form 


au 


r=] 
P2 Q2 


where qi = 1 — pj, and is characterized by the pair (pı, po). We now 
construct the function Cp(t) for this experiment. Writing 


A(pı, p2) = min = | 
Pit P2 Ut ge 


p 
B(pı, p2) = max( ai =) 
Pit P2 q +g 
we have 
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Fei) =0 for 0<ti<A 
=2 for Baisi 
=ptp of ate for A<t<B 
It is clear then that 
Ce =0 for t<A 

To obtain Cp(t) for t > B, we first observe that, if for A < t < B, 
F(t) = pit pe (or a+ 4) 
Fpl) =pı/A (oœ i/A) 
Suppose for definiteness that Fp(/) = p/A, then B = qı/(qı + q2). 
From this we easily conclude that for A < t < B 


Peli 2B-1 
HO = 3-5 


then 


Thus, for > B 
c faa +f Poa g=] 
t) = = 24t— 
'p(t) bs u i a= 


and for A <t < B 


cn -F 0-4) 


By Theorem 12.4.1, for binomial experiments P and R, P D R if and 
only if A(P) < A(R) and B(P) > B(R). [See Figure 22.] More spe- 


Cpt) 


(0,1) 


(0,0) A B (1,0) 
Figure 22 
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cifically, writing, for any dichotomy R, 


z Tij Tij 
A(R) = min (=) B(R) = max (==) 
i Nij + Toj: i Nuy tTa 
we have 


Crt) 


0 for t < A(R) 
2i—1 for t> B(R) 


Thus, if P is any binomial dichotomy of sample size 1 with A(P) < 
A(R) and B(P) > B(R), since Cp(t) = Cr(t) for t < A(P) and t> 
B(P) and Cp is linear over (A(P), B(P)) while Cp is convex over this 
interval, Cp(t) > Cr(t) for all ¢, so that P D R. We now apply these 
facts to the problem mentioned in the introduction. 

For samples of size 1, the experiments H, S, CH, CS are binomial 


dichotomies characterized by the pairs (s 3), (n, 5, (s =), 


— ô á 
(n, t=) respectively. Suppose for definiteness that 6 > hs. Com- 
puting A, B for each of H, S, CH, CS yields 


H S CH cs 
A hs ha (1 = a)(1—h) (1 = s)(1 — A) 
hs+6 ha+6 (L—s)\(1—h) +1—h—-s+6|]Q—o)0—A)+1—h—st+s 
B h(l — 8) a(1 — A) a(1 — h) h(l — 8) 
hla) +h—6 | s(l—h)+s—6 a(l—h)+38—6 AQ — s) th 


Since A(H) = A(S), we have H DS if and only if B(H) > B(S); i.e., 

h(1 — s) S s(1 — h) i sA 
M=8Fh-i M-P re s i.e., (6 — hs)(s — h) > 0; i.e., 
h < s. Comparing H with CH, CS yields H D CH if and only if h < s 
and h < 1 — s, and H D CS if and only if h < 1 — s. Thus H is the 
most informative of the four experiments if and only if h is the smallest 
of the four numbers, h, s, 1 — h, 1 — s. The computations for ô < hs 
yield the same conclusion. If h is the smallest: of the four numbers, it 
is easily verified that S D CH, CS D CH, while S and CS are not com- 
parable. The extension of Theorem 12.3.1 to samples of size n guaran- 
tees that, if h is the smallest of the four numbers h, s, 1 — h, 1 — s, a 
sample of n H’s is more informative than any other sample of n indi- 
viduals, selected from H, S, CH, and CS in specified amounts. Finally, 
selection at random from the general population can be considered as a 
mixture of experiments H, CH with probabilities h, 1 — h, so that 
H > CH implies that H is more informative than random sampling 
from the general population. 
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Binomial distribution, example, of dom- 
inating monotone procedure, 194 
of exponential type, 179 
of sufficient partition, 222 
examples of statistical games, 113, 124, 
169, 170, 179, 207, 214, 218, 229, 
245, 266, 300, 307 
Binomial variable, example of space of 
outcomes, 76, 90 
Boundaries for sequential-probability 
ratio test, 267, 277 
Boundary point, 31 
relative, 32 
Bounded, from below, 294 
concept of, 83 e 
requirement for loss function, 82, 83, 
142, 294 
Bounded set, 31 


Characteristic function of a set, 80 
Chi-square distribution, 180, 319, 320 
Classification problem, 157 
Class of functions, on A X 9, 216 
sequential decision, definition, 94 
Class of strategies, admissible, 
pleteness conditions, 141 
definition, 137 
example, 122 
in S games, 125 
Bayes, completeness conditions, 140 
definition, 136 
in S games, 125 
«admissible, completeness of, 139 
definition, 136 
example, 137 
«Bayes, completeness conditions, 140 
definition, 135 
example, 137 
extended admissible, completeness con- 
ditions, 140 
definition, 136 


com- 
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Class of strategies, extended admissible, 
example, 137 
extended Bayes, completeness con- 
ditions, 140 
definition, 136 
example, 137 
in S games, 124-131 
set-theoretic relations among, 127, 
130, 138 
optimal, definitions, 135-137 
for games with finite 9, 146 
general discussion, 121-123 
Closed set, 31 
Closure of Bayes solutions, 126 
Combination of experiments, 331 
Combination of independent experi- 


ments, 332 
Comparison of experiments, concept of, 
324 


equivalent conditions for, 326-331 
example, 325, 326, 336 
for binomial dichotomies, 334, 335 
for dichotomies, 332 
Comparison of strategies, concept of at 
r least as good, 133 
concept of better, 121, 133 
Complementary set, 31 
Complete class, conditions for, 128, 130, 
139, 140 
definition, 137 
equivalence to essentially complete 
_ class, 137 
example, 122 
general discussion, 121, 122 
in games, with finite A, 199 
with finite 9, 145, 146 
in S games, 124-131 
of pure strategies, 294, 312 
Completely reduced game, 12 
Composite hypotheses, convexity of (a, 
B) set, 201, 202 
minimax approach, 202-207 
statement of problem, 200, 201 
Composition of functions, 76 
Concave functions, 38, 147, 244 
Conditional expectation, 84, 239 
definition, 84, 85 
of convex function, 101, 295 
properties, 84, 101, 239 
Conditional probability, definition, 84 
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Conditional risk function, definition, 
87 
see also A posteriori risk 
Confidence coefficient, 305 
for scale parameter, 318 
for translation parameter, 314 
Convergence theorems, for Bayes pro- 
cedures, 248 
for expectations, 100, 101 
Convex functions, definition, 38 
expectation of, 41, 101, 295 
properties, 39-42, 295 
sequences of, 42 
Convex hull, 36, 38, 42, 48 
definition, 32 ` 
Convex loss function, consequence for 
estimation, 294 
Convex mixture, 48 
Convex payoff function, 51-54 
Convex sets, 30-38, 42, 50 
Cost function, definition, 95 
Critical region, 82 
Cumulative distribution function, 88 
Cylinder set, definition, 92 
property of stopping regions, 93, 
238 P 


Decision function, as a partition, 81 
as sequence of vectors, 145 
definition, 81 
randomized, see Randomized strategies 
terminal, 93, 94-96, 238, 241 
truncated sequential, 91, 94 
see also Strategies 
Decision problem, general formulation, 
103 
Decision space, 78, 81 
Densities, 88, 146, 171, 233, 313 
Density function, as extended mixed 
strategy, 55, 56, 58 
definition, 56 
Derived games Gy, class of strategies in, 
egood, 135 
minimax, 135 
definition, 133 
value, 134 
lower, 133, 134 
upper, 133, 134 
Diameter of a set, 31 
Discriminant analysis, 157, 158 
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Distributions, with monotone likelihood 
ratio, 187 
see also Probability distributions 


eadmissible strategies, completeness of 
class, 139 
definition, 136 
«Bayes strategies, 
ditions, 140 
definition, 135 
«good strategies, definition, 46, 47 
Equivalent games, definition, 11 
examples, 11, 12, 124, 125 
invariance under, 12, 16, 17, 28 
Equivalent methods of randomization, 
172, 214, 216 
Equivalent strategies, definition, 133 
Error, absolute, 301-305, 315 
first and second kind, 200 
relative, 305, 318-320 
Essentially complete class, 137, 140, 141, 
146 
definition, 137 
Estimate of Poisson parameter, 82 
of bounded relative error, 318-320 
Estimates, admissible, 296, 320-323 
‘based on sufficient statistics, 294-296 
Bayes, 296-305 
biased, 299 
multiplicative property of, 317 
non-randomized, 294, 312 
of bounded error, absolute, 304, 305, 
315 
relative, 304, 305, 318 
of fixed sample size, 323 
of proportion defective in finite lot, 
144, 165, 167, 169, 170 
of scale parameter, 316-320 
of translation parameter, 307-315 
translation property of, 309 
Estimation problem, structure of, 294 
Event, concept of, 76 
probability of, 77 
Events, independence of, 96 
Expectation, conditional, 84, 85, 101, 
239, 295 
convergence theorems, 100, 101 
definition, 80 
for exponential class, 193 


completeness con- 
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Expected sample size, 271, 292 
approximation for, 275 
Expected value, see Expectation 
Experiment, concept of, 79, 324, 326 
Experiments, combination of, 331, 332 
comparison of, 324-336 
Exponential class, 179-194 
admissible minimax estimates for, 
320-323 
characterization of 
pling plan, 277 
monotone procedures for, 181-193 
natural range of, 193 
randomized strategies for, 180 
Extended strategies, admissible, com- 
pleteness conditions, 140 
definition, 136 
Bayes, completeness conditions, 140 
definition, 136 
mixed, definition, 55 
densities, 55, 56, 58 
general discussion, 54 
Extreme point, definition, 37 
Extreme points, 37, 38 
of sets of good strategies, 67, 147 


sequential-sam- 


Finite games, as S games, 49 
definition, 12 
fundamental theorem on, 42 
good strategies in, 42, 44, 131, 132 
characterization of spaces of, 47, 67 
matrix of, 12 
solution of, 62, 65, 67 
value, 42 
Finite group, 227, 229 
Finite population, sampling from, 229- 
233 
Fixed sample-size experiment, 81 
Fixed sample-size games, definition, 82 
with finite A, 171-207 
with finite 2, 143-155 
Fixed sample-size procedures as optimal, 
254, 323 
Function, bounded from below, 50 
characteristic, 80 
concave, 38 
convex, 38-42, 295 
decision, 81 
loss, 82 
on product space, 76 
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Function, risk, 82 

utility, 105 
Functions, composition of, 76 
Fundamental identity, 274, 275 


Gambler’s ruin, 277 
Games, completely reduced, 12 
constant sum, 10 
description, in terms of moves, 1-3 
in terms of tree, 2-5 
examples, 13, 14 
isomorphic, 13 
mixed extension, 26 
normal form, 8-11 
reduction, 11 
solution of, 60-69 
strategies in, 3-8 
maximin, 27 
minimax, 27 
mixed, 26 
pure, 26 
two-person, 9 
value, 27 
lower pure, 15 
pure, 16 
upper pure, 15 
zero sum, 9, 10 
see also Derived games, Equivalent 
games, Finite games, Fixed sam- 
ple-size games, S games, Sequen- 
tial games, Statistical games 
Games G,, see Derived games 
Games of perfect information, 3, 17-22 
Good strategies, definition, 16 
for convex payoff, 51-53 
for nature, 154, 193, 198, 290, 291 
for statistician, see Minimax strategies 
in finite games, 42, 44, 49 
characterization of, 17, 65-69 
in S games, 49 
properties, 47, 60 
uniqueness of, 53, 57 
see also Minimax strategies, Mixed 
strategies 
Group, admissible, 224 
see also Admissible group 
definition, 223 


Half-space, 31 
Hyperplane, 31 


INDEX 


Hyperplane, perpendicular to line, 31 
separating, 31, 33-36, 50 
supporting, 35, 37 

Hypothesis, composite, 200 
simple, 200 
two-sided alternatives, 162 


Inadmissible minimax strategies, 123, 
124 
Independence of subpartition, 98 
Independent partitions, definition, 96 
Independent random variables, defini- 
tion, 96 
properties, 97, 99 
Independent subsets, definition, 96 
Index set, 77 
Indifference, among vectors, 118, 119, 
120 
as preference relation, 104 
Infimum, definition, 15 
relation to supremum, 148 
Information sets, 3, 4 
Inner point, 31, 42 
relative, 32 
Inner product, 31 
supremum of, 40 
Intersection of sets, 31 
Invariance principle, for Student’s hy- 
pothesis, 235 
for translation parameter, 307-315 
general discussion, 208, 209 
special case with infinite group, 233- 
236 
Invariant estimates, example of non- 
minimax, 313 
minimax property, 310 
non-randomization of, 312 
of bounded error, absolute, 314, 315 
relative, 317, 318 
Invariant minimax strategy, existence of, 
226, 227 
inadmissibility, of, 315 
Invariant procedure, 225 
Invariant strategy, definition, 225 
Isomorphic games, 13 


Joint distribution, definition, 79 
of random variable and parameter, 85, 


86 
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Least favorable distribution, 169, 280, 
281, 323 
see also Good strategies for nature 
Length of vector, 31 
Likelihood ratio, 152, 221 
Limit inferior, 101 
Limit superior, 101 
Line, 30 
Line segment, 31 
Loss function, convex, 294 
definition, 82 
depending on absolute error, 801- 
303 
for monotone procedures, 182 
quadratic, 297-301 
Lower pure value, definition, 15 
in mixed extension, 26 
Lower value of derived games G,, 133, 
134 


Mapping onto, concept of, 6 
Markov matrix, characterizing an ex- 
periment, 326 
concept of, 326 
Matching pennies, good mixed strategies 
for, 27 
with imperfect spying, 13 
with spying, 13 
Matrix of finite games, definition, 12 
Maximin strategy, definition, 27 
Maximum likelihood estimate, 305 
Mean value, see Expectation 
Median, definition, 302 
property, 302 
Minimal complete class, equivalence to 
admissible class, 140 
example of non-existence, 132 
for monotone procedures, 187 
Minimal sufficient partition, definition, 
218 
examples, 221, 222 
existence of, 221 
Minimal sufficient statistic, 218 
Minimax invariant estimates, 314, 315, 
317, 318 
Minimax loss, see Minimax regret 
Minimax regret, general discussion of, 
114 
in S games, 149 
objections to, 115-117 
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Minimax principle, general discussion, 
113 
objections to, 113, 115 
Minimax procedures, see Minimax strat- 
egies 
Minimax risk, 155, 280, 281 
see also Value in statistical games 
Minimax strategies, class of, 135 
definition, 27 
inadmissible, 123, 124 
in classification problem, 159 
in dichotomies, 154, 155 
in fixed sample-size games, 195 
in sequential games, 290, 291 
invariant, 226, 227, 233 
monotone form of, 189 
Mixed extension, definition, 26 
Mixed good strategy, 27 
Mixed strategies, 24, 25 
definition, 26 
extended, 54, 55 
in single-experiment games, 83 
Modal interval, 305, 307 
Moment-generating function, of sample 
size, 275 
properties, 273, 321 
Monotone decision procedures, 181-194 
admissibility of, 187 
as minimax strategies, 189 
Bayes, 186 
completeness of, 181, 182 
for densities, 187 
uniqueness of, 186 
More informative experiments, defini- 
tion, 328 
Moves, 1, 2 
chance, 1 
choice at, 1 
irrelevant, 15 
outcome of, 1 
over-all chance, 6 
personal, 1 
Multidecision games, definition, 171 
see also Statistical games with finite 
A 
Multiple classification, 157-161 
Multivariate normal distribution, 158 


n-faced die, 161, 229 
n space, 30 
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Neyman-Pearson theory, 170 
Normal distribution, example, of ex- 
ponential type, 180 
of least favorable, 198 
of sufficient statistic, 223 
examples of statistical games, 122, 156, 
162, 179, 194, 198, 207, 224, 235 
Normal form of a game, 8-11 
definition, 10 
description, 9 
example, 6 
Null set, 31 
Number of observations, see Sample size 


Open set, 31 

Operating characteristic function, see 
Class of functions on A X9, 
Probability of a given terminal 
decision 

Optimal property, likelihood ratio test, 
152, 292 

Optimal sequential-sampling plan, exis- 
tence of, 243 

Optimal strategies, general discussion, 
121-123 

see also Principles of choice 

Optimal terminal-decision regions, char- 
acterization of, 259, 260 

Optimal terminal decisions, existence of, 
241 

Over-all chance move, 6 


@g, class of probability distributions on 
Z, T7 
Parameter space, 77 
Particle counting, 82, 318-320 
Partition, as a sequential-sampling plan, 
93 
as informative as Z, definition, 216 
definition, 76 
determined by a function, 76, 80 
sufficient, see Sufficient partition 
Partitions, independent, 96 
Payoff function, convex, 51-54 
definition, 10, 11 
Perfect-information games, 3, 17-24 
definition, 18 
example, 14, 20 
existence, of good strategies in, 19, 24 
of pure value in, 19, 24 


INDEX 


Perfect-information games, existence, of 
saddle point in, 21 
order of, 18 
reduction of matrix to single element, 
21, 22 
solution by inductive method, 20 
Poisson distribution, example, of ex- 
ponential type, 179 
of sufficient statistic, 213 
examples of statistical games, 82, 148, 
149, 177, 318 
Prediction problem as Bayes estimation 
problem, 305, 306 
Preference relation, among vectors, 118 
on outcome space, definition, 104 
Principles of choice, axioms for, 116-120 
Bayes, 116 
example, 111 
for vectors, 118 
general discussion, 111-116 
in general games, 112 
minimax, 113 
minimax loss or regret, 114, 116, 117 
Probability, of a given terminal decision 
approximation for, 275-277 
of termination in sequential games, 


270-277 
rate of convergence to unity, 270, 
271, 275 
Probability distributions, arisingin games, 
6, 102 


conditional, 84 

definition, 26 

joint, 79, 85, 86 

on action space, 103, 216 

on outcome space, 6, 102 

utility function on, space of, 105 

utility of, 9, 102, 103, 105, 110 
Probability ratio, 152, 221 
Pure strategies, completeness of class, 

146, 294, 312 

definition, 26 

for nature, 77 

for statistician, 78, 81, 294, 312 
Pure value, definition, 16 


Quadratic loss function, 297 
estimates for, Bayes, 299 


invariant, 314 ) 
existence of conditional risk, 298 
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Randomization, equivalent methods, 
172, 214-216 
Randomized invariant strategy, 225 
Randomized strategies, convergent sub- 
sequence of, 197 
definition, 87 
see also Mixed strategies 
Random sampling, 233 
Random variables, definition, 79 
independent, 96, 97, 99 
involving densities, 88 
Rectangular distribution, estimation of 
range, 320 
sequential, 320 
example of least favorable, 323 
examples of statistical games, 307, 
315, 316 
monotone procedure for, 187 
Reduction of games, definition, 11 
Relative boundary point, 32 
Relative inner point, 32 
Risk function, conditional, see A pos- 
teriori risk 
definition, 82, 87, 88, 95 
for densities, 88 
lower and upper bounds for, 256, 265 
upper bound for, 251 


S games, admissible strategies, 125 
Bayes strategies, 125 
closure of, 126 
geometric characterization of, 125, 
126 
classes of strategies, complete, 124- 
132, 146 
completeness conditions for, 128, 130 
counterexamples to equality of, 127- 
129 
set-theoretic relations among, 127 
definition, 47 
good strategies, 49 
value, 49 
see also Fixed sample-size games with 
finite Q 
Saddle point, 20, 21 
Sample size, definition, 81 
distribution of, 270, 271, 275 
expected value of, 271, 275, 292 
moment-generating function, 275 
optimal, 170, 292 
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Sample space, definition, 77 
general discussion, 76 
generated by random variable, 79 
Sampling from finite population, 229-233 
minimax invariant strategy, 233 
Scale-parameter problem, 316-319 
Separating hyperplane, 33-36, 42, 50 
definition, 31 
Sequential analysis, theory, 270-277 
Sequential estimates, examples 245, 254, 
255, 307, 316, 320 
Sequential games, Bayes procedures, for 
finite 2, 257-260 
for finite 2 and A, 261, 262 
general, 238-245 
Bayes risk, concavity of, 257 
continuity of, 257 
examples, of dichotomies, 280, 281 
of trichotomies, 282-290 
existence of optimal terminal de- 
cisions, 241 
expected sample size, 271 
experimental-design problem, 256 
fundamental identity, 274 
general discussion, 89-94, 237, 238 
optimal sampling procedures, 243 
probability, of a given terminal de- 
cision, 275-277 
of termination, 270-275 
Sequential-probability ratio test, defini- 
tion, 267 
optimal property, 292 
Sequential procedure as truncation of 
another procedure, 251 
Sequential procedures, Bayes, see Bayes 
sequential procedures 
non-truncated, 248 
truncated, see Truncated sequential 
games, Truncated sequential pro- 
cedures 
Sequential-sampling plan, 93, 243 
Set, bounded, 31 
closed, 31 
closure of, 31 
complementary, 31 
convex, 30-38, 42, 50 
diameter of, 31 
null, 31 
open, 31 
single element, 76 
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Set-theoretic relations among classes of 
strategies, 138-140 
Sets, intersection of, 31 
union of, 31 
Simple hypothesis, 153, 200 
Simple solution, definition, 65 
Single-element set, 76 
Single-experiment game, definition, 82 
example, 82 
Single experiments, discussion, 81 
Size of sample, see Sample size 
Solving games, 60-74 
examples, 61, 62, 64, 69 
Space, of a posteriori probabilities, 84, 
146, 245 
of a priori probabilities, 84, 146, 245 
of good strategies, characterization of 
in finite games, 47, 67 
of outcomes, 6, 102 
of strategies, 94 
definition, 10, 11 
for nature, 77 
for statistician, 78, 81 
in general games, 4-6, 102, 103, 111 
see also Mixed strategies, Randomized 
strategies, Space of a priori proba- 
bilities 
Sphere, definition, 31 
Statistical games, general discussion, 75- 
83 
with finite A, 171-207, 261-293 
with finite 2, 143-170, 257-293 
see also Fixed sample-size games, Se- 
quential games 
Statistician’s strategy in truncated se- 
quential games, general definition, 
94 
Stopping regions, definition, 238 
determination of, 263-270 
for sequential trichotomies, 282-290 
for special dichotomies, 278-282 
Strategies, admissible, see Admissible 
strategies, Class of admissible 
strategies 
Bayes fixed sample size, see Bayes 
solutions, Bayes strategies, Class 
of Bayes strategies 
Bayes sequential, see Bayes sequential 
procedures 
closed under equivalence, 137 
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Strategies, comparison of, 123, 133 
complete class of, 121, 124-132 
definition, 10, 11 
eadmissible, 136 
«Bayes, 135 
e-good in derived games, 135 
extended admissible, 136 
extended Bayes, 136 


good, 16 
inadmissible minimax, 123, 124 
inferior, 22 * » 


invariant, 225 
maximin, 27 
minimax, 27 
in derived games, 135 
mixed, 24-26 
mixed good, 27 
pure, 77, 78, 81, 294, 312 
randomized, 83, 85, 87 
Strategy, concept of, 3, 4 
Subpartitions, definition, 98 
determined by functions, 212 
existence of coarsest, 219 
independence of, 98 
Sufficient partition, as informative as Z, 
217 
conditions for, 210 
definition, 210 
minimal, 218 
Sufficient statistics, definition, 211 
factorization theorem, 211-212 
for densities, 223 
general discussion, 208, 209 
minimal, 218 
Supporting hyperplane, 35 
containing extreme points, 37 
Supremum, 17, 40, 42 
definition, 15 
relation to infimum, 148 


Terminal decision, approximation for 
probability of, 275-277 


Terminal-decision rule, 93, 94 
existence of optimal, 241 
Trace of matrix, definition, 328 
Translation-parameter problem, 307-315 
Truncated sequential games, Bayes pro- 
cedures, 238-245 
definition, 95 
Truncated sequential procedures, 90-95, 
237, 238 
as optimal, 254 
equivalence of two characterizations, 
93 
general definition, 94 
Two-person game, 9 


Uniformly Riemann-integrable functions, 
56 
Union of sets, 31 
Upper pure value, definition, 15 
in mixed extension, 26 
Upper value in derived games, 133, 134 
Utility function, 9 
as an expectation, 105 
boundedness of, 109 
definition, 105 
existence of, 105-110 
on interval, 108 
Utility of probability distributions, 9, 
102, 103, 105, 110 


Value, definition, 27 
example of non-existence of, 59 
in derived games, 134 
in general games, 16, 42, 44, 46, 49, 134 
in S games, 49 
in statistical games, 146, 154, 195, 280, 
281 
Variance for exponential class, 193 
Vectors, length of, 31 
preference relation among, 118 


Zero-sum game, 9, 10 
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